Closed rxu17 closed 1 year ago
This is a wonderful start, let's discuss more soon
@thomasyu888 I updated the structure so this is what the cross validation will generally look like for other cross validation tickets we use. Let me know what you think (I can also move cross_validate_ids_between_two_files
to the validate
class), and I'll move forward with this.
get_dataframe
from the individual filetypeformat class to process_functions
and rename it as specific to the filetypeformat class it belongs to_cross_validate
function will loop through ancillary_files
(which is just the result of extract.get_center_files
now) and get the file that we are trying to cross validate against so we can read it inWhen we do change the structure of get_center_files
/separate out the clinical files, we should be able to just modify each filetypeformat class's _cross_validate
function and remove the places that loop through the list of lists as well as just removing the get_dataframe
function we copied to process_functions
and replace it with the class specific one. Everything else should stay the same.
I might just make the section of the code where we loop through the ancillary_files
as its own function so easier to remove in the future.
Purpose: This PR adds further cross-validation functionality by adding the ability to cross-validate ids in the maf files with the clinical sample files.
This uses the newly introduced
ancillary_files
attribute in the specific filetypeformat classes to cross validate.