SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Generate Annotated VCF/CSVs for download #1171

Closed davmlaw closed 1 month ago

davmlaw commented 1 month ago

Downloading a 1-2M record multi-sample VCF with annotations can take so long things time out

I used a few tricks in SACGF/variantgrid_com#86 but we should probably download it then stream a static file

Suggest:

Current behavior:

view_vcf - cohort_grid_export (csv/vcf) View_sample - sample_grid_export

can probably keep those URLs/

davmlaw commented 1 month ago

Working in branch feature/issue_1171_generate_file_for_download

Mostly done, just need to handle browser bits I think.

We could possibly raise an issue about having analyses handle this as well.

Maybe also have some kind of regular purge of old files?

davmlaw commented 1 month ago

Took a while as I had to do a few things:

Will probably have to do a few extra things to clean up CachedGeneratedFiles etc over time now as these are quite large still

davmlaw commented 1 month ago

Hmmm, this has taken 2 hours to export 16% - so 8% per hour = 12.5 hours... I think celery is going to time out here...

davmlaw commented 1 month ago

Spun polish scope creep into SACGF/variantgrid#1173 - will do later when less busy

This is VG only, and has been tested and deployed