abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
178 stars 26 forks source link

The normalized read depth values #198

Closed zainabae closed 10 months ago

zainabae commented 10 months ago

I have a question about normalized read depth values. Some of these values are very close to the neutral copy number of 2. Is it appropriate to apply filtering in this context?

I read a paper that used a filtering approach, categorizing regions with RD > 1.8 as duplications and regions with RD < 0.4 as deletions.

https://onlinelibrary.wiley.com/doi/10.1111/jeb.14214

Is there a rationale for filtering these values? Can we consider values < 1.2 as deletions and values > 2.8 as duplications based on the color intensity in Excel files? In this case, intense green color could signify values above 2.8, while intense red color could represent values below 1.2, with values in between appearing somewhat whitish

arpanda commented 10 months ago

I wouldn't recommend using that filtering criteria because the value also reflects the cell frequency of the event. To clarify, values near 2 indicate a low-frequency event, whereas higher values correspond to high-frequency events. If your intention is to filter only high-frequency events, then this approach may suffice.

Alternatively, I recommend considering the B-allele frequency (BAF) likelihood value in conjunction with the read depth (RD) value for a more comprehensive analysis.

zainabae commented 10 months ago

I wouldn't recommend using that filtering criteria because the value also reflects the cell frequency of the event. To clarify, values near 2 indicate a low-frequency event, whereas higher values correspond to high-frequency events. If your intention is to filter only high-frequency events, then this approach may suffice.

Alternatively, I recommend considering the B-allele frequency (BAF) likelihood value in conjunction with the read depth (RD) value for a more comprehensive analysis.

  • Arijit

Thank you for your response. Can you clarify what do you mean by cell frequency?

arpanda commented 10 months ago

Cell frequency/fraction refers to the prevalence of the event within the cell population. Basically, it's the deviation from 2 and depends on the model.

For combined caller, its cf_1 column (Source:- https://github.com/abyzovlab/CNVpytor/blob/master/GettingStarted.md#predicting-cnv-regions-using-joint-caller-prototype ).