Open ethanumn opened 3 years ago
Per my current understanding - developed set of software to automate aggregation and production of .ssm and .params.json files
Was able to use generated inputs for MATS08 test data to run pairtree successfully. Unsure if the results make sense but it was able to utilize my inputs.
Wrote a set of basic test cases (encapsulated in a unittest) for .ssm files
Wrote a set of basic printouts when aggregating .xlsx files to make sure the number of aggregated variants, zero reads, etc. make sense
Made changes per request. Verified all changes using data in "example/" (comparing calls, master, aggregated, .ssm, params.json, etc.). Passed ssm tests.
Created a class to generate a pdf of metrics. Not the prettiest pdf but it will suffice (used matplotlib and pdfpages)
Added workaround to sort rows in dataframe/xls by chromosome number. Double checked all of the pandas merge calls. Added some more statements to be printed to output pdf.
Issue here is that calls_df overwrites VAF in aggregated_df, and therefore it shows up down the line even though the ALT_DEPTH has been set to zero. Solution is to drop the VAF from calls_df.
TODO