abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
178 stars 26 forks source link

CNVpytor : some questions about the output #207

Closed ZYongQi closed 7 months ago

ZYongQi commented 7 months ago

helllo,this is ZY. I used CNVpytor to call CNVs,and I got the output like:

deletion NC_048218.1:25001-43000 18000 0.1187 5.675432e-08 6.736101e-32 8.508699e-07 2.201970e-27 0.0000 0.3657 0 deletion NC_048218.1:67001-81000 14000 0.1966 4.874976e-04 4.808891e-14 3.551201e-04 1.063119e-37 0.0000 0.2209 0 deletion NC_048218.1:626001-631000 5000 0.1390 5.310495e+00 5.298629e-36 5.775993e+02 4.146784e-18 0.0000 0.0070 402100 deletion NC_048218.1:731001-733000 2000 0.0000 1.277388e+03 1.541493e-19 1.000000e+00 1.000000e+00 -1.0000 0.0370 300100 deletion NC_048218.1:739001-745000 6000 0.1438 4.711761e+00 3.170543e-34 1.012075e+02 1.564757e-26 0.0000 0.0003 288100 deletion NC_048218.1:781001-787000 6000 0.0132 1.682901e-04 2.824720e-68 6.113292e-03 8.276560e-48 0.0000 0.0000 246100 deletion NC_048218.1:937001-941000 4000 0.1289 5.788912e+02 1.255817e-14 1.993186e+04 8.242601e-14 0.0000 0.0005 92100

I have some questions:

  1. why the value of q0 column appears almost all 0,only few appears -1? Is there something wrong with my previous steps?
  2. I chose a bin size of 1000bp,can you give me some suggestions of the filter parameters,such as q0,pN,dG?

looking forward to your reply,best wishes!

arpanda commented 7 months ago

The presence of -1 in the q0 column indicates the absence of reads in that region. It is preferable to have values close to 0. A high q0 value indicates a low mapping quality for that region.

For the second question, the following filtering criteria can be used for q0,pN,dG.

Q0_range -1 0.5
p_N 0 0.5
dG_range 100000 inf

Ref: https://github.com/abyzovlab/CNVpytor/blob/master/GettingStarted.md#predicting-cnv-regions

Thank you, Arijit

ZYongQi commented 7 months ago

The presence of -1 in the q0 column indicates the absence of reads in that region. It is preferable to have values close to 0. A high q0 value indicates a low mapping quality for that region.

For the second question, the following filtering criteria can be used for q0,pN,dG.

Q0_range -1 0.5
p_N 0 0.5
dG_range 100000 inf

Ref: https://github.com/abyzovlab/CNVpytor/blob/master/GettingStarted.md#predicting-cnv-regions

Thank you, Arijit

Hi,here is ZY.Thank you for your suggestions.Because of the absence of the genome mask file,I did the filtering step successfully with a python script I wrote myself,rather than using CNVpytor.But still I have some doubt.

  1. The output file contains only E-value,but if I need a P-value,what need I do ?

  2. I used the final result to draw a distribution map of the type of CNVs (deletion and duplication). But it is clear that the number of deletions detected is greater than the number of duplications (almost 1000 : 1). Is this a normoal case? Why is the number of duplications so small?

arpanda commented 7 months ago

Uncertain about the necessity of the P-value? If you wish to calculate it, here are the instructions for e-value calculations, which can then be converted to a P-value.

The occurrence of CNV events is influenced by the sample. Consider the following steps:

-Arijit

ZYongQi commented 7 months ago

Uncertain about the necessity of the P-value? If you wish to calculate it, here are the instructions for e-value calculations, which can then be converted to a P-value.

  • e-val1 -- e-value (p-value multiplied by genome size divided by bin size) calculated using t-test statistics between RD statistics in the region and global,
  • e-val2 -- e-value (p-value multiplied by genome size divided by bin size) from the probability of RD values within the region to be in the tails of a gaussian distribution of binned RD.

The occurrence of CNV events is influenced by the sample. Consider the following steps:

  • Visually inspect the results using a Manhattan plot to obtain a comprehensive overview of the read depth.
  • Increasing the bin size will either eliminate small CNV events or merge them, leading to a reduction in the overall number of events.

-Arijit

Thank you for your valuable advice. I'll try it right away. Best wishes to you.