NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
20 stars 0 forks source link

Data Download Component for QA App #382

Closed fvankrieken closed 9 months ago

fvankrieken commented 10 months ago

All (or at least most) data product pages in the QA app use the same component to select a ProductKey for QA. While on the QA page, it'd be convenient to be able to quickly download outputs for the currently viewed ProductKey.

There are a couple questions on implementation, and I'd welcome input from @caseysmithpgh and @jackrosacker here, primarily

  1. How often are you downloading outputs of our builds manually from S3/github links/etc during the QA process? As opposed to either using the QA app itself or pulling data programmatically.
  2. For downloading, how many different files per product do you generally download? The zipped output that some folders have? A specific qa file (like qaqc_bbldiffs for example)? What sort of functionality would be most convenient here in the app? I'd imagine we could either have
    • a dropdown that lists files present, so you could select one for download.
    • a button to download a zip of the whole output
    • something else?

Downloading a zip is convenient, though it would require downloading to the qa server and then zipping or something along those lines. Seems maybe not the most performant. It also might involve doing a lot in memory - the streamlit download button seems to rely on having the file in memory. Having "entities" (individual output files) defined in json or something along those lines in python and then choosing them to be downloaded is nice in that we can just use urls rather than dealing with download logic.

Would love thoughts from @alexrichey/@damonmcc/@sf-dcp on what would be nice from the user side of things as well as thoughts on code implementation

sf-dcp commented 9 months ago

@fvankrieken Regarding "something else" functionality - a third option is to provide a link to s3 directory containing the build files.

damonmcc commented 9 months ago

I'd be in favor of expecting all builds to zip their outputs and a button on each QA page just downloads that zip. That seems like it'd be relatively simple and standardized compared to the alternative for a first pass at this:

I'd wonder why we'd want to prioritize giving links to "power users" like GIS to navigate folders when they can already do that by going to DO or using Cyberduck

update: this is a longer-term idea that shouldn't block adding a useful link for power users