apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.4k stars 13.71k forks source link

[4.1.0.rc3] Error to download PDF or Image #30645

Open jacob-roldan opened 6 days ago

jacob-roldan commented 6 days ago

Bug description

Go to one Dashboard 3 dots top right / Download / Export to PDF or Download as Image

One error message is showed. See screenshot

I've tested in environment http://35.87.42.94:8080/

Screenshots/recordings

image

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

dosubot[bot] commented 6 days ago

Hey @jacob-roldan! 👋 I'm here to help you with any bugs, questions, or becoming a contributor. Let's squash those bugs together!

I found a similar open issue that might be related to the problem you're experiencing: Upon download as PDF, charts on a long dashboard render as loading image. This issue suggests that the problem might be due to the virtualization feature, which off-loads charts that are away from the viewport. A workaround is to disable the virtualization feature by updating the Superset configuration:

  1. Open the superset_config_docker.py file.
  2. Update the FEATURE_FLAGS variable as follows:

    FEATURE_FLAGS = {
       "DASHBOARD_VIRTUALIZATION": False,
    }

This change ensures that the dashboard loads all charts at once, preventing loading spinners from appearing in the downloaded PDF or image [1].

To continue talking to Dosu, mention @dosu.

sadpandajoe commented 6 days ago

@geido i can't remember if i should have spun this up with a feature flag or not.

sadpandajoe commented 6 days ago

@jacob-roldan I ran this on an instance I have and it looks like I do see that message for a bit (maybe a minute or two) but eventually the dashboard does download. How long did you wait until moving away from the dashboard?

michael-s-molina commented 6 days ago

@sadpandajoe I can also confirm this problem using our internal test environment.

4.0.2

https://github.com/user-attachments/assets/fc5f4562-be15-4cdf-9a05-1811cdec1ea4

4.1.0rc3

https://github.com/user-attachments/assets/787c9648-dc1f-49e6-9c03-df56e54c2ebd

michael-s-molina commented 6 days ago

Is there any configuration change between 4.0.2 and 4.1.0rc3 that is needed to generate the screenshots?

sadpandajoe commented 5 days ago

@michael-s-molina @jacob-roldan are there any logs? We've actually been running this code on prod for a bit and haven't gotten this issue. Trying to debug this but can't seem to repro it on our end.

michael-s-molina commented 5 days ago

@sadpandajoe @geido @eschutho I was able to pinpoint the problem. The reason for the failure is because the screenshot generation on 4.1.0 RC3 caches the screenshots using the THUMBNAIL_CACHE_CONFIG which is a NullCache by default. A NullCache is a cache that does not cache anything, and that's why the frontend cannot find the screenshots and enters in a loop. The fix for this would be to make the default configuration of THUMBNAIL_CACHE_CONFIG similar to what we do with the Explore form data and save the thumbnails in the database using the SupersetMetastoreCache:

THUMBNAIL_CACHE_CONFIG = {
    "CACHE_TYPE": "SupersetMetastoreCache",
    "CACHE_DEFAULT_TIMEOUT": // set a value
    # Should the timeout be reset when retrieving a cached value?
    "REFRESH_TIMEOUT_ON_RETRIEVAL": True,
    # The following parameter only applies to `MetastoreCache`:
    # How should entries be serialized/deserialized?
    "CODEC": // define the appropriate codec
}

Talking to @villebro about this issue, he raised a good point where previously Celery workers were not a hard requirement to install Superset but more of an optional feature. If the screenshot generation always requires Celery workers from now on, that could constitute a breaking change. Let me know your thoughts.

geido commented 1 day ago

@sadpandajoe @geido @eschutho I was able to pinpoint the problem. The reason for the failure is because the screenshot generation on 4.1.0 RC3 caches the screenshots using the THUMBNAIL_CACHE_CONFIG which is a NullCache by default. A NullCache is a cache that does not cache anything, and that's why the frontend cannot find the screenshots and enters in a loop. The fix for this would be to make the default configuration of THUMBNAIL_CACHE_CONFIG similar to what we do with the Explore form data and save the thumbnails in the database using the SupersetMetastoreCache:

THUMBNAIL_CACHE_CONFIG = {
    "CACHE_TYPE": "SupersetMetastoreCache",
    "CACHE_DEFAULT_TIMEOUT": // set a value
    # Should the timeout be reset when retrieving a cached value?
    "REFRESH_TIMEOUT_ON_RETRIEVAL": True,
    # The following parameter only applies to `MetastoreCache`:
    # How should entries be serialized/deserialized?
    "CODEC": // define the appropriate codec
}

Talking to @villebro about this issue, he raised a good point where previously Celery workers were not a hard requirement to install Superset but more of an optional feature. If the screenshot generation always requires Celery workers from now on, that could constitute a breaking change. Let me know your thoughts.

Thanks @michael-s-molina we are currently discussing what the next steps should be for having Celery optional and the cache.