Open chhotii-alex opened 1 year ago
The current functionality is what was implied by the grant application: there's a button in the corner, and if you click it, you get a download of the entire dataset (all positive results with viral loads, as annotated as it can be while being deidentified).
I'm really not sure I agree with the idea of changing this.
If the download is just of the selected groups... I can see this causing mess/confusion at the user's end, if they are downloading to do their own analysis in R or whatever. They may download at different times and get slightly different subsets of the data. (For example, if you view the early/delta/omicron split, it's only fetching data for those date ranges that are unambiguously in a time period when one time period is dominant. If you select option(s) under "Pregnancy Status", you're only getting females.) Someone might download more than once, after different futzing with the webapp, and then be confused as to why they have clashing results running their analyses on datasets downloaded from the same site. Whereas, if they are doing their own analysis in R or SAS or Pandas or whatever, they probably know enough to apply their own masks, conditionals, etc.
We get into a lot of UX questions that would be avoided by just one option to download it all. If there were more groups selected than we allow to be queried, and they request a download, do we then just give the results of the first 8 or whatever number of queries?
Should we split the results into different files for different groups, and combine multiple files in a zip? This means I do more work, so that the user does more work.
And database questions. If we're returning the results all in one file, do we try to combine the queried in to one big query? WHERE (this) OR (this) OR (this). Making our database groan when the end result might not be different from having no WHERE clause at all.
It seems like this would be doing a lot of work to make it seem like the user is getting more when in fact they are getting less.
This should live near the histogram, and download the data for the queried groups, not the whole dataset.