freeseek / mocha

MOsaic CHromosomal Alterations (MoChA) caller
MIT License
79 stars 22 forks source link

Questions about how to filter callset #33

Closed uqzqiao closed 2 years ago

uqzqiao commented 2 years ago

Hi! I have two more questions about the filtering strategies for the mCA callset,

(1) Based on README.md, the mLOX events were not required to be identified as a loss by MoChA. I just want to confirm that whether it is the same for mLOY events. p.s.: I assume inference of mLOY is independent of the event type in the output file given either mLOY is inferred based on LRR median statistics (as what has been used by MoChA to infer event types).

(2) When generating the list of samples with mCAs, one mCA phenotype to be generated is the $pfx.cll.lines file by using cat $pfx.{{3,4,6,8,11,17,18}p_loss,{1,6,11,13,14,16,17,22}q_loss,13q_cnloh,{2,3,4,5,12,17,18,19}_gain}.lines | \ awk -F"\t" -v OFS="\t" 'NR==FNR {x[$1]++} NR>FNR && $1 in x {print $1}' - $pfx.stats.tsv > $pfx.cll.lines Since not all events on p/q arm are summarized, I am wondering whether it just summarises the events enriched in samples with CLL (chronic lymphocytic leukaemia).

Thank you again for taking the time to answer my questions!!

freeseek commented 2 years ago

LRR is usually not reliable on the sex chromosomes, so yes, mLOX and mLOY events are identified regardless of how MoChA classified them based on LRR

The .cll.lines is an example of how to enrich for mCAs typical of CLL, but it is not supposed to be the end-all on the topic. It is more like a suggestion

uqzqiao commented 2 years ago

Oh that's very helpful! Thank you so much!