cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
581 stars 448 forks source link

Multiple clinical data files #10740

Open j-hudecek opened 3 months ago

j-hudecek commented 3 months ago

For studies with a large number of attributes it would be nice if we could split the data files. Provide a few files with a subset of patients with a different subsets of columns. The importer would then fill the complete sample-attribute matrix in memory and leave the unfilled ones with a missing value (NA would do I guess).

Currently, the data file for samples (or patients) needs to contain all the data - all the rows and columns. It would be useful to feed the importer multiple data files that contain a subset of rows and columns.

Instead of having one file:

id attribute1 attribute2
sample1 A B
sample2 C NA

We could have files

id attribute1
sample1 A
id attribute2
sample1 B
id attribute1
sample2 C

Advantages:

Disadvantages