ethanumn / mpn-aml-pairtree

0 stars 0 forks source link

Parse data into .ssm file #6

Open ethanumn opened 3 years ago

ethanumn commented 3 years ago

TODO

ethanumn commented 3 years ago

Per my current understanding - developed set of software to automate aggregation and production of .ssm and .params.json files

ethanumn commented 3 years ago

Was able to use generated inputs for MATS08 test data to run pairtree successfully. Unsure if the results make sense but it was able to utilize my inputs.

ethanumn commented 3 years ago

Wrote a set of basic test cases (encapsulated in a unittest) for .ssm files

ethanumn commented 3 years ago

Wrote a set of basic printouts when aggregating .xlsx files to make sure the number of aggregated variants, zero reads, etc. make sense

ethanumn commented 3 years ago

Made changes per request. Verified all changes using data in "example/" (comparing calls, master, aggregated, .ssm, params.json, etc.). Passed ssm tests.

Created a class to generate a pdf of metrics. Not the prettiest pdf but it will suffice (used matplotlib and pdfpages)

example.metrics.pdf

ethanumn commented 3 years ago

Added workaround to sort rows in dataframe/xls by chromosome number. Double checked all of the pandas merge calls. Added some more statements to be printed to output pdf.

ethanumn commented 3 years ago

https://github.com/ethanumn/mpn-aml-pairtree/blob/b604a8fd846a963b4766e937fa608f3955df7d5d/utils/xls_file/xls_aggregators/mpn_aml_aggregator.py#L183

Issue here is that calls_df overwrites VAF in aggregated_df, and therefore it shows up down the line even though the ALT_DEPTH has been set to zero. Solution is to drop the VAF from calls_df.