apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.6k stars 13.45k forks source link

[SIP-98] Proposal for Adding Playwright as an Alternative for Selenium to Enable deck.gl Charts in Reports #24948

Closed kgabryje closed 12 months ago

kgabryje commented 1 year ago

Motivation

Currently, Superset uses Selenium for rendering charts in reports and in thumbnails. However, this approach has proven to be problematic when dealing with deck.gl charts. The current Selenium-based approach fails to handle these charts properly, resulting in blank visualizations in the reports. To address this issue and provide a more reliable solution for rendering deck.gl charts, we propose integrating Playwright as an alternative to Selenium.

Proposed Change

The proposed change involves introducing Playwright for rendering charts in Superset reports. Playwright is an open-source library for automating web browsers, similar to Selenium but with better support for modern browser features and improved performance. By using Playwright, we aim to provide a more stable and accurate chart rendering experience in Superset reports, especially for deck.gl charts.

We wrote a POC that proved that the underlying problem, which is inability to render WebGL images (which deck.gl charts are based on), does not occur when running reports with Playwright. The screenshots below present the contents of email reports of dashboard containing deck.gl charts when using Selenium (before) and Playwright (after).

Before (Selenium):

image

After (Playwright):

image

Since configuring Playwright requires installing additional dependencies, in order to prevent breaking changes in existing deployments we propose to put the new flow behind a feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS while supporting Selenium at the same time. Users who do not enable the new feature flag will be unaffected by the proposed changes.

New or Changed Public Interfaces

There will be no changes to existing public interfaces in Superset as part of this feature addition.

New dependencies

The proposed feature will require the addition of the Playwright library as a new Python dependency. Playwright is actively maintained, with Apache 2.0 license.

Migration Plan and Compatibility

The migration to Playwright as the alternative to Selenium will not require any database migrations or updates to stored URLs. Existing reports using deck.gl charts should continue to work as expected.

In order to avoid introducing a potential breaking change in existing deployments, Playwright will only be used if PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled. Otherwise, the reports will be run by Selenium.

Rejected Alternatives

  1. Continue using Selenium with geckodriver (Firefox): This approach was rejected due to the limitations of geckodriver, which can't render WebGL visualizations in headless mode.
  2. Continue using Selenium, change the driver to Chrome: This approach was rejected due to problems with taking fullpage screenshots.
kgabryje commented 1 year ago

CC @eschutho @villebro @michael-s-molina @john-bodley

john-bodley commented 1 year ago

Thanks @kgabryje for putting this SIP together. Would it be possible to expand further on the rejected alternatives? Specifically are these mentioned issues tracked in the Selenium project as known issues, and if so is their an ETA in terms of when they might be resolved?

Granted it seems like Playwright circumvents the issue currently experienced with Selenium (using either the Firefox or Chrome driver), but it does come at a cost in terms of introducing (and thus supporting) another framework. Additionally are there any known downsides with Playwright, are there circumstances where it fails to render when Selenium succeeds?

The TL;DR is there’s always pros and cons with any approach and thus having a solid understanding of the landscape is highly beneficial before voting on this SIP.

michael-s-molina commented 1 year ago

Thank you @kgabryje for writing this SIP. I think it's great when we take the time to write SIPs and have the opportunity to discuss big changes before they are merged to master.

Given that deck.gl charts are considered legacy charts that will be removed in the future, did we consider the effort/benefit of migrating them vs introducing another framework that we'll need to support? I don't know if we already have replacements for them, but if we do and the cost is not significantly higher, we could end up in a better state with just one framework and less legacy charts.

kgabryje commented 1 year ago

Thanks for comments, I appreciate them!

@john-bodley Historically we switched the driver from Chrome to Firefox as a workaround for screenshots in the reports being cut off. I couldn't find a relevant issue in Selenium's repo, neither recent nor from the time when that change was made in Superset. Firefox on the other hand does not support rendering WebGL in headless mode (https://bugzilla.mozilla.org/show_bug.cgi?id=1375585) As for Playwright running Chromium, it does seem to cover all bases and I didn't get it to fail in any common scenario - screenshot of a whole dashboard (regardless of the layout) or of a specific element, and most importantly, it does render WebGL. I agree that having 2 libraries that essentially do the same thing is a big drawback. However, if Playwright turns out to be a success in a long run, I think we might consider fully replacing Selenium and doing a switch in a major release.

@michael-s-molina Most (all?) geospacial visualization libraries are based on WebGL, so even if we did migrate from deck.gl to another lib, we'd face the same issue. As for replacing deck.gl with another framework - I don't think it's a good way forward. Deck.gl is a well-maintained library with a big community. Our chart plugins are buggy, but I wouldn't put the blame (or at least not all of it) on the library itself - the plugins haven't been properly maintained for years and with some effort we can bring them to the same level of quality as the newer plugins. I started that effort by refactoring all files to Typescript in order to make them easier to maintain. More of those are coming - refactoring from class to functional components, implementing best practices as described in deck.gl docs, etc.

john-bodley commented 1 year ago

@kgabryje do you think there's merit in first raising this problem as a Selenium GitHub issue? Their community may be aware of a workaround and/or could potentially gauge the scale/complexity of the problem. Again the more relevant data we have the easier to weigh up the various pros and cons of the proposal.

michael-s-molina commented 1 year ago

Most (all?) geospacial visualization libraries are based on WebGL, so even if we did migrate from deck.gl to another lib, we'd face the same issue.

Fair point.

As for replacing deck.gl with another framework - I don't think it's a good way forward. Deck.gl is a well-maintained library with a big community. Our chart plugins are buggy, but I wouldn't put the blame (or at least not all of it) on the library itself - the plugins haven't been properly maintained for years and with some effort we can bring them to the same level of quality as the newer plugins. I started that effort by https://github.com/apache/superset/pull/24933 in order to make them easier to maintain. More of those are coming - refactoring from class to functional components, implementing best practices as described in deck.gl docs, etc.

Is the intention here to rename the plugins and remove legacy from their names or create new ones? What will be the relationship with ECharts GL charts? Do you think we should write a specific SIP about this topic?

kgabryje commented 1 year ago

@kgabryje do you think there's merit in first raising this problem as a Selenium GitHub issue? Their community may be aware of a workaround and/or could potentially gauge the scale/complexity of the problem. Again the more relevant data we have the easier to weigh up the various pros and cons of the proposal.

Our version of Selenium (3.141.0) is from 2018 (the Chromium driver was equally ancient) and probably the first thing the community would say is to upgrade to the latest version and see if it solves the problem. I do not know how much effort this upgrade would require, but I think it might be a good time to introduce a newer tool that covers our use cases, while offering a better developer experience (easier to write and read) than Selenium.

I see the drawback of maintaining two libraries/flows, but this would be only temporary until we're ready to make a breaking change and stick to 1 library only.

kgabryje commented 1 year ago

Is the intention here to rename the plugins and remove legacy from their names or create new ones? What will be the relationship with ECharts GL charts? Do you think we should write a specific SIP about this topic?

I don't think the plugins are fundamentally bad - it's a number of smaller issues that make the experience worse than newer plugins. So in my opinion, we should rather focus on fixing, refactoring and improving their usability rather then starting from scratch. If we do that + eventually migrate to API V1, we could drop the "legacy" prefix. As for the SIP - I think migrating to another library or completely changing the architecture (which I think works well in our current plugins) would warrant a SIP, which is not the case here.

michael-s-molina commented 1 year ago

Thanks for the responses @kgabryje. Regarding:

What will be the relationship with ECharts GL charts?

Will we ignore ECharts GL? Use it to replace part of the legacy plugins? Use it to enhance our legacy plugin?

kgabryje commented 1 year ago

Will we ignore ECharts GL? Use it to replace part of the legacy plugins? Use it to enhance our legacy plugin?

Sorry, missed that question. I'm not very familiar with EchartsGL but to my understanding it's an extension of Echarts that provides geospatial visualisations, so it has the same use case as deck.gl. What do you have in mind about "replacing" or "enhancing" our plugins? Are there visualisations that are available in EchartsGL but not in deck.gl? @villebro I recall we were chatting about EchartsGL vs deck.gl some time ago if you want to chime in 😄

michael-s-molina commented 1 year ago

What do you have in mind about "replacing" or "enhancing" our plugins?

I'm thinking about the effort to adopt ECharts as our official visualization library. If ECharts GL provide the same features as DeckGL, then maybe we should invest in migrating to ECharts GL instead of improving the DeckGL plugin. If that's not the case, maybe we can still enhance our chart library by enabling some of the ECharts GL charts which would live alongside DeckGL.

Are there visualisations that are available in EchartsGL but not in deck.gl?

I don't know but I think this investigation is kind of a prerequisite before we decide to revamp the legacy plugin. The question we should ask is: Does ECharts supports our GL requirements with their charts? If yes, let's migrate. If no, let's improve the DeckGL plugin and remove the legacy prefix.

JohnDietrich-Pepper commented 1 year ago

ECharts GL does not support the same functionality as Deck.gl charts at this point in time particularly the flexibility to render things like contour maps that has a pending PR on Superset. Compare https://echarts.apache.org/examples/en/index.html#chart-type-map https://[deck.gl](https://deck.gl/examples/arc-layer)/examples/arc-layer

Also compared to Deck.gl the ECharts GL seems significantly slower on render on a pretty fast machine (6800HS / 4070) vs. Deck.gl in the examples (hard to compare as examples are a bit different).

Finally, there seems to be interest in actually developing deck.gl charts (heatmap and contour being recent examples) whereas there is no pending PR to add ECharts GL.

betodealmeida commented 1 year ago

I think that regardless of the state of the legacy deck.gl visualizations or the path we take forward (improve them, replace them with ECharts GL, etc.), we do need to support WebGL in Superset, since not only it's used everywhere in geoviz but also in custom visualizations that handle huge quantities of data. Personally I would love to see the deck.gl visualizations being improved and brought back to tier 1, but I suggest we have that discussion in a separate channel (or SIP), and focus on what solution we're going to use for WebGL.

While I think it might be worth trying the newer version of Selenium (4.11.2 vs the current 3.141.0), I think Playwright has other advantages, since it's much easier to install and run and has a more modern API (with optional support for async). Taking a screenshot with Playwright is incredible simple, since you don't need to worry about all the webdriver options:

pip install playwright
playwright install
import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://superset.apache.org")
        print(await page.title())
        await page.screenshot(path="screenshot.png", full_page=True)
        await browser.close()

asyncio.run(main())
eschutho commented 1 year ago

@kgabryje do you think there's merit in first raising this problem as a Selenium GitHub issue? Their community may be aware of a workaround and/or could potentially gauge the scale/complexity of the problem. Again the more relevant data we have the easier to weigh up the various pros and cons of the proposal.

@john-bodley I also spent a little time looking into the selenium github issues for previous questions on the topic. When most people were looking for full-page screenshots, they would direct them to use the Chrome driver and when they needed webgl support, they would direct them to use the Firefox driver. I couldn't find any use cases for either issue where it could be solved by the same driver.

villebro commented 1 year ago

My quick thoughts on the discussed topics:

kgabryje commented 12 months ago

SIP implemented by https://github.com/apache/superset/pull/25247. Thanks to everyone who participated in the discussion!

michael-s-molina commented 12 months ago

Closing the SIP as it was approved.

iamsaso commented 7 months ago

@kgabryje is there a reason you didn't include full_page=True parameter for playwright in your PR?

kgabryje commented 7 months ago

@kgabryje is there a reason you didn't include full_page=True parameter for playwright in your PR?

@iamsaso full_page=True is an argument of .screenshot() method when it's called on page object. Here we call it on element object, which screenshots the whole element regardless of viewport size.

SeanStock-THG commented 6 months ago

Hi @kgabryje, I seem to have noticed a massive increase in CPU and RAM resources on taking the screenshots for thumbnails and reports vs Selenium. Is this expected? not sure if can be set to use a specified user like we did with Selenium to avoid it having to load 1000+ charts for each user.