etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
548 stars 166 forks source link

cnvkit seg export noise filtered by GISTIC2 #450

Open t-neumann opened 5 years ago

t-neumann commented 5 years ago

Hi,

I don't know if this is actually a problem of the cnvkit seg export function or GISTIC2, but due to the lack of some GISTIC2 issue repository, I decided to post it here.

I wanted to run GISTIC2 on cnvkit results following the following thread:

https://www.biostars.org/p/272370/

For this, I combined all produced cnvkit.cns into a seg file like this:

cnvkit.py export seg cnvkit/processed/*cnvkit.cns -o GISTIC/ABC.gistic

This works fine and produces a reasonable looking seg file.

head GISTIC/ABC.gistic
ABC_11-cnvkit   1       10001   53989   165     0.433825
ABC_11-cnvkit   1       53990   91846   142     -0.575479
ABC_11-cnvkit   1       91847   113707  82      0.515724
ABC_11-cnvkit   1       113708  139834  98      0.165566
ABC_11-cnvkit   1       139835  141967  8       -0.666416

Now when I run GISTIC2 on this however, it seems like there is something off with the data, because it says that all samples were removed by noise filtering:

/usr/local/bin/gp_gistic2_from_seg -b ABC_GISTIC -seg GISTIC/ABC.gistic -refgene /groups/obenauf/Software/GISTIC/GISTIC_2_0_23/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat
Opening log file:  /tmp/java.log.9633
GISTIC version 2.0.23

GISTIC 2.0 input error detected:
All samples were removed by noise filtering.

Now I am not sure whether this is generally a problem of the cnvkit results, a file format problem or actually a GISTIC2 problem. Has anyone ever encountered this?

etal commented 5 years ago

I haven't encountered this myself. Not sure what's happening, but the "All samples were removed" message could be spurious, e.g. there's some other error reading the input and GISTIC2 handles all input errors by throwing up into its jeans and reporting that it didn't load any samples. Anyone else have ideas?

This part of CNVkit could use some help from someone more experienced with GISTIC, and/or a reimplementation of GISTIC or equivalent so that we don't keep getting tripped up by these incompatibilities.

VicBioDev commented 1 year ago

I encountered this error. After some digging in gistic2's src, I found the cause of the error. Samples in my input file have too many segments, more the default max setting of 2500. To avoid this error, you need to set a higher max value with cli argument "-maxseq".

For example: ./gistic2 -b base_dir -seg seg_file -refgene refgene -maxseq 5000