Multiple clinical data files

For studies with a large number of attributes it would be nice if we could split the data files. Provide a few files with a subset of patients with a different subsets of columns. The importer would then fill the complete sample-attribute matrix in memory and leave the unfilled ones with a missing value (NA would do I guess).

Currently, the data file for samples (or patients) needs to contain all the data - all the rows and columns. It would be useful to feed the importer multiple data files that contain a subset of rows and columns.

Instead of having one file:

id	attribute1	attribute2
sample1	A	B
sample2	C	NA

We could have files

id	attribute1
sample1	A

id	attribute2
sample1	B

id	attribute1
sample2	C

Advantages:

Working with files with 100s of columns is complicated
- diffs are nearly useless so when using git to commit data files for version control it's not much better than a binary blob
- viewing/editing them by hand is tricky, it can crash shittier tools
Pipelines could produce files covering different attributes that could be directly imported
Samples coming from different sources could stay in different files (useful for GENIE, too)
It would be easy to have separate studies for subcohorts/subsets of data with exactly the same data files

Disadvantages

Importer gets more complicated, validation error messages get more complicated
Typo in a attribute name means there are now to attributes and a bunch of missing values
Errors that currently get caught with "not all lines have the same number of columns" might pass validation and result in a bunch of missing values instead

cBioPortal / cbioportal

Multiple clinical data files #10740