apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.72k stars 13.13k forks source link

[SIP-129] POC - Real-time Dashboards powered by data streams #28272

Open surapuramakhil opened 2 months ago

surapuramakhil commented 2 months ago

Please make sure you are familiar with the SIP process documented here. The SIP will be numbered by a committer upon acceptance.

[SIP] Proposal for ...</h2> <h3>Motivation</h3> <p>Today, real-time dashboards are built on repeated polling of 10 seconds interval. For Every pool, SQL queries are executed, typically the entire data needed for dashboard would be fetched. </p> <p>This causes a lot of load on warehouse DB - especially when you have the lot of active users on Superset and at the same time you have a lot of real time dashboards. This thing can be skipped entirely if your dashboards have low retention periods.</p> <h3>Proposed Change</h3> <p>Like, How SQL Lab performs SQL queries and generates dataset which are required to power dashboards. An alternate pipeline powered by steams would generate/update the datasets required for dashboards. Like SQL Lab, there will be another module where user can specify how a steam needs to be consumed (functions), and how those dataset needs to be updated.</p> <ol> <li>Stream as a data source</li> <li>Stream consumers function for dataset population, rest everything remains same (restricting scope)</li> </ol> <h3>New or Changed Public Interfaces</h3> <p>Describe any new additions to the model, views or <code>REST</code> endpoints. Describe any changes to existing visualizations, dashboards and React components. Describe changes that affect the Superset CLI and how the Superset is deployed.</p> <h3>New dependencies</h3> <p>Describe any <code>npm</code>/<code>PyPI</code> packages that are required. Are they actively maintained? What are their licenses?</p> <h3>Migration Plan and Compatibility</h3> <p>Describe any database migrations that are necessary, or updates to stored URLs.</p> <h3>Rejected Alternatives</h3> <p>Describe alternative approaches that were considered and rejected.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rusackas"><img src="https://avatars.githubusercontent.com/u/812905?v=4" />rusackas</a> commented <strong> 2 months ago</strong> </div> <div class="markdown-body"> <p>I've always conjectured that this would be tied into the Global Async Queries feature. two main reasons: 1) It has a redis caching layer, so that might lighten the load on your DB in general. 2) If we want to do <em>real</em> realtime analytics, the dashboard would need a major overhaul so that (a) charts all subscribe to a websocket for updates, which would be published on query completion OR a push of subscribed streamed data, and (b) all charts should be updated so that they support transitioning/transposing on data uptates rather than just blinking a hard refresh.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/surapuramakhil"><img src="https://avatars.githubusercontent.com/u/9161543?v=4" />surapuramakhil</a> commented <strong> 2 months ago</strong> </div> <div class="markdown-body"> <p>Restricting scope of this SIP. Which are necessary and sufficient for this to work.</p> <ol> <li>Adding Streams as alternate data source</li> <li>Stream processors - "consumers function" for dataset population</li> </ol> <p>One dataset populates, the rest of the flow would remain the same. Chart queries will hit in-Memory dataset where underlying data lies. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/rusackas"><img src="https://avatars.githubusercontent.com/u/812905?v=4" />rusackas</a> commented <strong> 4 weeks ago</strong> </div> <div class="markdown-body"> <p>@surapuramakhil please move forward with a [DISCUSS] thread on the dev mailing list if you wish to execute on it, otherwise this will be closed as discarded fairly soon. Pleae reach out if you'd like any assistance with that process.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/surapuramakhil"><img src="https://avatars.githubusercontent.com/u/9161543?v=4" />surapuramakhil</a> commented <strong> 1 week ago</strong> </div> <div class="markdown-body"> <p>@rusackas, I got busy. I will work back on this.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>