cBioPortal / icebox

very low priority issues
0 stars 0 forks source link

Download data of the selected samples in study view #152

Open jjgao opened 3 years ago

jjgao commented 3 years ago

Background: cBioPortal is a visualization platform for analyzing genomics data. The public cBioPortal data files are organized per study and available for download on datahub for each study. However, for selected samples (possibly across studies), there is currently no option to download the files.

Goal: Provide an option to download data files for selected samples in the study view. All clinical and genomics data should be included in a zip/gz file. The file format should follow the staging files format, i.e. the same as the files in datahub.

Note: this may create a lot of load for the backend server. How can we handle that?

Approach: Develop an API to return all genomic and clinical data in a zip/gz file based on a study or virtual study ID.

jjgao commented 3 years ago

related to https://github.com/cBioPortal/cbioportal/issues/6804 and https://github.com/cBioPortal/cbioportal/issues/4866

jjgao commented 3 years ago

Comments from @sheridancbio:

in order to avoid database bottleneck hits to performance during such requests, this functionality could be handled by a separate system, so that the cbioportal frontend supports the construction of a cohort specification, but then that cohort specification is sent to a specialized external server which extracts and packages the requested cohort into a downloadable resource (zip file?). This service could also be performed in a job/batch mode, where the user does not wait for the download itself to happen, but instead gets a link to a job page which gets updated as the job is worked on and the packaged study files become available. Potentially, the system which delivers the packaged files could do so without using the production cbioportal database. It could use its own database ... or maybe a faster approach would be to simply filter / merge input file sets directly. The pipelines team has scripts already written for subsetting and merging cancer study file sets, and they could be adapted. I think this mode of operation should at least be considered before we build in support for this feature based on hitting the production database / backend API.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jjgao commented 3 years ago

Requests of this feature kept coming to us...

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.