Sage-Bionetworks / GENIE-Sponsored-Projects

This repository contains processing code for GENIE sponsored projects used for creating cBioPortal files
MIT License
5 stars 1 forks source link

Add functionality to skip non-applicable timelines #45

Open hhunterzinck opened 2 years ago

hhunterzinck commented 2 years ago

Some cohorts do not collect data for certain timelines. For example, the BLADDER cohort does not (yet?) collect tumor marker instruments used to create the data_timeline_labtest.txt file. Currently, some of the timelines are not run with a conditional statement checking the cohort inside of the processing code. For example: https://github.com/Sage-Bionetworks/GENIE-Sponsored-Projects/blob/ea33ed8a587a3e8f53c4d504842606da83cb6f7b/geniesp/bpc_redcap_export_mapping.py#L907

It would be more robust to make this more generic to new cohorts or changes in existing cohorts. Perhaps, could the class definitions of each cohort include a flag for each timeline type that is checked during processing in order to proceed with that specific timeline's generation? Or could the potential to create a certain timeline be determined programmatically from the mapping and variables included in the underlying dataset?

thomasyu888 commented 2 years ago

@hhunterzinck.

Actually, after revisiting this some more, the particular chunk of code highlighted is quite problematic. The reason is because this isn't a file that is created, but it creates a section of the data_timeline_treatment.txt file... Is there a reason why BrCa/CRC/NSCLC don't have the "Radiation Therapy" Treatment type?

A good example of a file that is skipped is "data_timeline_labtest.txt" - this we can achieve with the self._exclude_files we just added.