cognoma / cancer-data

TCGA data acquisition and processing for Project Cognoma
Other
20 stars 28 forks source link

Export complete data #28

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

@gwaygenomics has previously brought up the need to export our processed datasets for all observations. https://github.com/cognoma/cancer-data/pull/20#issuecomment-242408331:

As a general comment, while I think it is definitely good for the ML group to have a single dataset that everyone is working on, restricting it like this may not be the optimal solution. Eventually the data will need to be more fluid and subset on the fly depending on different rules which we will need to define later. (e.g. Unsupervised feature construction should not remove gene expression samples that don't have mutation status)

The motivation of this pull request is that Cognoma is producing the most user-friendly data. We should export the complete datasets to enable many applications. Currently, I'm not planning to upload this data to figshare, but we could (especially once we continuously integrate), but users will be able to generate it for themselves.