apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.33k stars 13.69k forks source link

[SIP-118] Ability to assign a unique asset id to a superset dashboard / visual #27194

Open hondyman opened 8 months ago

hondyman commented 8 months ago

[SIP] Proposal for Adding additional metadata to a Superset Dashboard for Data Governance</h2> <h3>Motivation</h3> <p>We need to provide information on various classified data used in reports to Regulatory bodies globally</p> <p>Description of the problem to be solved. In large regulated industries, we have to keep track of classified data for various global reporting such a PII for GDPR, financials metrics for Dodd/Frank and other regulatory bodies. We have to be able to say where / when and how certain data sets are used inside the org. To do this we assign an asset id (UUID) to each report, process and interface we report, xfer or export data from and we manage these through Data Governance.</p> <h3>Proposed Change</h3> <p>Superset has metadata but its not editable, users should be able to add additional metadata to reports that allows for easier identification and governance. In our case we need to add an asset id to the metadata so we can track its usage. I have provided a logical diagram of how we would see the process working in our org. The data catalog manages all our metadata so it will have an inventory of all dashboards power bi, superset and looker each is assigned an asset id and is actively managed by DG. </p> <p>Because superset uses integers inside an environment such as Dev, QA and prod the integers in each environment can be different. The asset id does not change between environments, its a consistent id thats callable from our portal.</p> <p><img referrerpolicy="no-referrer" src="https://github.com/apache/superset/assets/54151500/a67202da-9f39-4103-9dbe-5b4bb34c5d22" alt="Screenshot 2024-02-21 1 20 16 PM" /></p> <h3>New or Changed Public Interfaces</h3> <p>Typically our Data Governance tool assigns asset ids (UUID) and we push them into the various assets such as dashboards either manually or via api. If a new metadata area was exposed we will expect this to be available via Swagger API for insert, read and update.</p> <h3>New dependencies</h3> <p>Not Applicable</p> <h3>Migration Plan and Compatibility</h3> <p>Not Applicable</p> <h3>Rejected Alternatives</h3> <p>Not Applicable</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/webobite"><img src="https://avatars.githubusercontent.com/u/28811018?v=4" />webobite</a> commented <strong> 7 months ago</strong> </div> <div class="markdown-body"> <p>Hi @hondyman I would like to know this use case in more detail. You are asking about a feature support which should be able to assign asset id to superset dashboard. I am not getting clear picture what would be usage of that with with your Database Governance Tool ? Can you pls explain that part if possible ?</p> <p>Thanks & Regards</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/hondyman"><img src="https://avatars.githubusercontent.com/u/54151500?v=4" />hondyman</a> commented <strong> 7 months ago</strong> </div> <div class="markdown-body"> <p>Hi Subham The superset id right now is an integer and we develop reports in development and bring to QA and production. Each report will have a separate ID in each environment, as a regulated industry we need to keep track of all reports and processes that have confidential and personal data for global regulatory requirements. We do this by assigning an Asset ID (UUID) that remains static in each environment and that is documented in our Data Governance platform, even if we upgrade or change reports. What I'm asking for is a JSON section where organizations can put metadata that is important to them, in our case AssetID. Happy to walk through our use case in more detail if that helps Regards -p</p> <p>[image: Screenshot 2024-03-04 1.03.36 PM.png]</p> <p>On Fri, Mar 1, 2024 at 8:49 PM Subham Singh <strong><em>@</em></strong>.***> wrote:</p> <blockquote> <p>Hi @hondyman <a href="https://github.com/hondyman">https://github.com/hondyman</a> I would like this use case in more detail. You are asking about a feature support which should be able to assign asset id to superset dashboard. I am not getting clear picture what would be usage of that with with your Database Governance Tool ? Can you pls explain that part if possible ?</p> <p>Thanks & Regards</p> <p>— Reply to this email directly, view it on GitHub <a href="https://github.com/apache/superset/issues/27194#issuecomment-1974185085">https://github.com/apache/superset/issues/27194#issuecomment-1974185085</a>, or unsubscribe <a href="https://github.com/notifications/unsubscribe-auth/AM5ESTFXM6FH67ZPMI4Z77TYWEVY7AVCNFSM6AAAAABDTRSYKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUGE4DKMBYGU">https://github.com/notifications/unsubscribe-auth/AM5ESTFXM6FH67ZPMI4Z77TYWEVY7AVCNFSM6AAAAABDTRSYKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUGE4DKMBYGU</a> . You are receiving this because you were mentioned.Message ID: <strong><em>@</em></strong>.***></p> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/webobite"><img src="https://avatars.githubusercontent.com/u/28811018?v=4" />webobite</a> commented <strong> 7 months ago</strong> </div> <div class="markdown-body"> <p>Thanks, seems interesting, I would like know more on this, Let me know if you are available on slack / or any other mean, would like to get some walk through about the same usecase @hondyman :-)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/webobite"><img src="https://avatars.githubusercontent.com/u/28811018?v=4" />webobite</a> commented <strong> 6 months ago</strong> </div> <div class="markdown-body"> <p>@hondyman any update on this ? (in case you have missed) </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rusackas"><img src="https://avatars.githubusercontent.com/u/812905?v=4" />rusackas</a> commented <strong> 5 months ago</strong> </div> <div class="markdown-body"> <p>If anyone wants to move this forward, the next step is to create a [DISCUSS] thread on the dev@ mailing list.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rusackas"><img src="https://avatars.githubusercontent.com/u/812905?v=4" />rusackas</a> commented <strong> 4 months ago</strong> </div> <div class="markdown-body"> <p>@hondyman please move forward with a [DISCUSS] thread on the dev mailing list, or this will be closed as discarded fairly soon. Pleae reach out if you'd like any assistance with that process.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rusackas"><img src="https://avatars.githubusercontent.com/u/812905?v=4" />rusackas</a> commented <strong> 1 month ago</strong> </div> <div class="markdown-body"> <p>@mistercrunch FYI :)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mistercrunch"><img src="https://avatars.githubusercontent.com/u/487433?v=4" />mistercrunch</a> commented <strong> 1 month ago</strong> </div> <div class="markdown-body"> <p>We do have a certain number of our core ORM models inherit a <code>uuid</code> through the <code>ImportExportMixin</code></p> <p><a href="https://github.com/apache/superset/blob/master/superset/models/helpers.py#L171-L174">https://github.com/apache/superset/blob/master/superset/models/helpers.py#L171-L174</a></p> <p>Idea being that when exporting/importing object across Superset environments, we need to be able to recognize that objects are the same while potentially having different ids.</p> <p>Now we've talked about refactoring this many times over the years. Some related thoughts:</p> <ul> <li>would be great to factor a <code>UuidModelMixin</code> out of <code>ImportExportMixin</code>, so it could could used in other (all!) models that are not necessarily exportable/importable</li> <li>could be good to use the UUIDs as PKs and FKs too (instead of current auto-increment PKs), as it would greatly simlify export/import logic. Currently we have to do all sorts of lookup tables to figure stuff out. I think it has perf implications though as index become significantly larger - I could see someone arguing against that idea. </li> <li>change the UI / API to be "UUID-native" and slowly deprecate exposing in of the internal auto-increment PKs as they expose a mild security threat, making it easy for user to guess urls and easily effectively scan through the data </li> <li>seems you have a use case of forcing <code>uuids</code>, seems reasonable... assuming you'd need to be quite cautious about preserving referential integrity during update. Maybe export/re-import/deleting-orignals might make sense (?)</li> </ul> <p>In terms of moving forward. A first, non-controversial step in that direction that wouldn't require a SIP [AFAIC] would be to:</p> <ul> <li>As mention, factor <code>UuidModelMixin</code> out of <code>ImportExportMixin</code> and into <code>superset/models/uuid.py</code>, make the second derive the former</li> <li>Move/create utility methods that are uuid-specific under this new <code>superset/models/uuid.py</code></li> <li>spread across all models that could benefit from <code>uuid</code>, handle database migration to auto assign</li> </ul> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rusackas"><img src="https://avatars.githubusercontent.com/u/812905?v=4" />rusackas</a> commented <strong> 3 weeks ago</strong> </div> <div class="markdown-body"> <p>@hondyman any intention to move this through the SIP / ASF process? Let me know here or on <a href="http://bit.ly/join-superset-slack">slack</a> if you want help. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/mistercrunch"><img src="https://avatars.githubusercontent.com/u/487433?v=4" />mistercrunch</a> commented <strong> 3 weeks ago</strong> </div> <div class="markdown-body"> <p>Here's a proof of concept if someone wants to run with it! <a href="https://github.com/apache/superset/pull/30398">https://github.com/apache/superset/pull/30398</a></p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>