cognoma / cancer-data

TCGA data acquisition and processing for Project Cognoma
Other
20 stars 28 forks source link

Modernize repo: readme updates & LFS #42

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

Follows up on #41.

@gwaygenomics I reran the pipeline and hit failure in 3.explore-mutations.ipynb. It looks like the disease column has been removed from samples.tsv. I expect this to cause many downstream issues. Why was this removed?

Did you run 4.covariates.ipynb in #41?

gwaybio commented 6 years ago

Did you run 4.covariates.ipynb in #41?

Nope, I only ran scripts 0, 1, 2

I reran the pipeline and hit failure in 3.explore-mutations.ipynb. It looks like the disease column has been removed from samples.tsv. I expect this to cause many downstream issues. Why was this removed?

That is great to know - the updated clinical data stores this information (sort of) in the histological_type variable.

However, I think this can be added relatively easily. This can be done with similar logic in cell 7:

# Extract sample-type with the code dictionary
clinmat_df = clinmat_df.assign(sample_type = clinmat_df.sample_id.str[-2:])
clinmat_df.sample_type = clinmat_df.sample_type.replace(sampletype_codes_dict)

Except map acronym to disease.

gwaybio commented 6 years ago

the updated clinical data stores this information (sort of) in the histological_type variable.

There is more granular detail in this variable now - it looks like it has some subtype and treatment status characterization

gwaybio commented 6 years ago

looks great @dhimmel - thanks for updating fully!