broadinstitute / CODECsuite

analysis pipeline for CODEC data
Other
9 stars 6 forks source link

Discordance in output.mutation_metrics.txt #19

Open wclee47 opened 4 months ago

wclee47 commented 4 months ago

Hi Ruolin,

I found that the numbers in the output.mutation_metrics.txt don't seem to be consistent with each other. As seen in the image below, n_A_eval + n_C_eval + n_G_eval + n_T_eval is not equal to n_bases_eval. Can you please let me know if I'm missing anything or misunderstanding these numbers?

image

Also "n_snv" in the output.mutation_metrics.txt always slightly differed from the number of rows with "SNV" type in the output.variants_called.txt.

Could you please clarify these discrepancies for me?

Thank you!

Won-Chul

ruolin commented 4 months ago

Hi Won-Chul, thanks for pointing out the issues. I believe there might be a bug regarding n_bases_eval != A+C+G+T. Regarding the difference of number of rows with "SNV" int the variant_called.txt vs n_snv in the metrics.txt, it should be because some of the SNVs are actually doublet mutations such as CC>TT and were presented at a single line. If you break them up to single mutations, the number should add up.

wclee47 commented 4 months ago

Hi Ruolin, as always thank you for the quick help. Yes, I confirmed that the doublet mutations should be separated to add up to the n_snv. Thanks for letting me know. The n_bases_eval is still in question and I hope you to debug it. Thank you!

ruolin commented 3 months ago

Hi Won-Chul, I have fixed the big in this release https://github.com/broadinstitute/CODECsuite/releases/tag/v1.1.3

Thank you again for reporting this issue!

wclee47 commented 3 months ago

Thank you!