etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
502 stars 163 forks source link

Targets: 3948 (1.4789%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) #768

Closed WortJohn closed 1 year ago

WortJohn commented 1 year ago

I analysized one sample with 2p24.3 amplification and verified positive gene by FISH with cluster signal. WES had very high coverage (>60000, mean coverage 300). File tumor_sample.targetcoverage.cnn had high log2 as follows, but file tumor_sample.cnr didn't find these bins before segment. I want to know why to set bins filterring based on "log2 < -5.0, log2 > 5.0, spread > 1.0" and how to set parameter to aviod this issue. Thanks. chromosome start end gene depth log2 chr2 15731982 15732102 DDX1 31051.4 14.9224 chr2 15735256 15735376 DDX1 21621.1 14.4002 chr2 15735561 15735714 DDX1 26259.4 14.6805 chr2 15736782 15736962 DDX1 20773.3 14.3424 chr2 15737456 15737636 DDX1 21578.6 14.3973 chr2 15739754 15739874 DDX1 21211.7 14.3726 chr2 15742656 15742776 DDX1 23282.9 14.507 chr2 15743256 15743412 DDX1 23994.4 14.5504 chr2 15743885 15744065 DDX1 21946.3 14.4217 chr2 15744508 15744657 DDX1 21671.1 14.4035 chr2 15746023 15746203 DDX1 31078 14.9236 chr2 15746240 15746420 DDX1 34106.9 15.0578 chr2 15747273 15747462 DDX1 23974 14.5492 chr2 15753296 15753476 DDX1 24072.8 14.5551 chr2 15757326 15757517 DDX1 24974 14.6081 chr2 15758255 15758435 DDX1 21133.1 14.3672 chr2 15760320 15760531 DDX1 26251.9 14.6801 chr2 15761124 15761304 DDX1 22192 14.4378 chr2 15763533 15763728 DDX1 25220.2 14.6223 chr2 15767130 15767310 DDX1 27403.8 14.7421 chr2 15768528 15768778 DDX1 31916.1 14.962 chr2 15768778 15769028 DDX1 35817.1 15.1284 chr2 15769686 15769866 DDX1 27532.7 14.7489 chr2 15770079 15770259 DDX1 26442.1 14.6905 chr2 15770876 15771056 DDX1 31465 14.9415 chr2 16059438 16059558 MYCN 21044 14.3611 chr2 16080022 16080222 MYCNOS 44797.9 15.4511 chr2 16080222 16080423 MYCNOS 52053.1 15.6677 chr2 16080606 16080936 MYCN 45571.6 15.4758 chr2 16082017 16082271 MYCN 51890.5 15.6632 chr2 16082271 16082526 MYCN 52848.2 15.6896 chr2 16082526 16082781 MYCN 31597.6 14.9475

MicrobioSee commented 1 year ago

Do you get the answers? I also face your question.

28rietd commented 1 year ago

Unfortunately the parameters used for the filtering are currently hard-coded in this file (L3-L5): https://github.com/etal/cnvkit/blob/0f827c0a360cbcac5e26eed830727abf1a066722/cnvlib/params.py#L1

I would also be in favour of being able to change the default parameter settings.

In addition, would it also be possible to write the masked_bins to an output file instead of stdout, as now no output is returned if the list exceeds 500 bins?

WortJohn commented 1 year ago

Thank you very much. It is a good idea for saving the filtered bins because the cluster signal maybe have very high log2ratio in itself. It may be a true amplification or deletion signal. If user can change the default parameter settings, cnvkit will be more perfect.