Sage-Bionetworks / Genie

Validation and processing of GENIE files
https://genie.synapse.org/
MIT License
12 stars 9 forks source link

[GEN-562] Cross-validate maf with clinical sample files #522

Closed rxu17 closed 1 year ago

rxu17 commented 1 year ago

Purpose: This PR adds further cross-validation functionality by adding the ability to cross-validate ids in the maf files with the clinical sample files.

This uses the newly introduced ancillary_files attribute in the specific filetypeformat classes to cross validate.

thomasyu888 commented 1 year ago

This is a wonderful start, let's discuss more soon

rxu17 commented 1 year ago

@thomasyu888 I updated the structure so this is what the cross validation will generally look like for other cross validation tickets we use. Let me know what you think (I can also move cross_validate_ids_between_two_files to the validate class), and I'll move forward with this.

When we do change the structure of get_center_files/separate out the clinical files, we should be able to just modify each filetypeformat class's _cross_validate function and remove the places that loop through the list of lists as well as just removing the get_dataframe function we copied to process_functions and replace it with the class specific one. Everything else should stay the same.

I might just make the section of the code where we loop through the ancillary_files as its own function so easier to remove in the future.