cognoma / cancer-data

TCGA data acquisition and processing for Project Cognoma
Other
20 stars 28 forks source link

Recurrence and Distant Metastasis #37

Open binaypanda opened 7 years ago

binaypanda commented 7 years ago

which column in the clinical_data should i consider to know if the tumor has recurred or not?

does _RFS_IND=1 mean definitely recurred?

how do i know if the tumor is primary or second primary?

how do i know if the tumor has recurred locally or distally?

does clinical_M=M1 or pathologic_M=M1 mean definitely metastasized?

dhimmel commented 7 years ago

@binaypanda, just want to make sure you know we're not the creators of the TCGA datasets. We use Xena Browser data. So I will try to comment on your questions, but the Xena mailing list may be a more appropriate place for your questions.

However, we've also had some of the same questions. See https://github.com/cognoma/cancer-data/issues/14#issuecomment-238642439.

Also see 2.TCGA-process.ipynb where we process the TCGA sample attributes. We rename several of the Xena column names using the following mapping:

# Mapping to rename and filter columns
renamer = collections.OrderedDict([
    ('sampleID', 'sample_id'),
    ('_PATIENT', 'patient_id'),
    ('sample_type', 'sample_type'),
    ('_primary_disease', 'disease'),
    ('acronym', 'acronym'),
    ('_primary_site', 'organ_of_origin'),
    ('gender', 'gender'),
    ('age_at_initial_pathologic_diagnosis', 'age_diagnosed'),
    ('_OS_IND', 'dead'),
    ('_OS', 'days_survived'),
    ('_RFS_IND', 'recurred'),
    ('_RFS', 'days_recurrence_free'),
])

which column in the clinical_data should i consider to know if the tumor has recurred or not?

_RFS_IND

how do i know if the tumor is primary or second primary?

The sample_type column I believe.

The questions I didn't answer are because I don't know. @jingchunzhu and @maryjgoldman will know, but they may prefer if you use the mailing list.

Finally, depending on your needs, the data we generate for Project Cognoma may work for you. See https://doi.org/10.6084/m9.figshare.3487685. We've done some additional processing on top of the Xena data to make it more user friendly for our use case.