DecodeGenetics / svimmer

Structural variant merging tool
43 stars 9 forks source link

Error when Merging VCFs #6

Open malmarri opened 3 years ago

malmarri commented 3 years ago

Hi,

I am trying to merge a list of VCFs of SVs using svimmer, but receive the following error:

Traceback (most recent call last): File "../../../svimmer/svimmer", line 140, in <module> header = read_header(vcf_f) # Read the header of the first VCF file File "../../../svimmer/svimmer", line 32, in read_header line = vcf_f.readline().decode("utf-8") AttributeError: 'str' object has no attribute 'decode'

Thanks

hannespetur commented 3 years ago

Hi, which version of python do you have? svimmer requires python 3.4 or more recent. You can print the python version with

$ python --version

Best, Hannes

malmarri commented 3 years ago

Hi,

I'm using python version 3.7.

I am trying to merge VCFs called from NUCMER alignments. Does svimmer only accept manta derived VCFs?

Thanks.

hannespetur commented 3 years ago

Hi, svimmer should also handle most other SV VCFs - but I don't know what NUCMER alignments are so I have probably never tested on those. I have added test data to the repo, could you try to pull the latest changes and try running svimmer on that data?

Best, Hannes

hannespetur commented 3 years ago

Ah, is it possible that you have uncompressed VCF? svimmer expects bgzipped+indexed VCFs like Manta outputs. If so, just compress with bgzip and index with tabix. I will need to issue a better error message for this...

Best, Hannes

jingydz commented 1 year ago

The svimmer software looks like it only fits the results of merging manta output. When I used CNVnator's output result file (converting cnv files to vcf files with cnvnator2VCF.pl) while using bgzip compression and tabix to build the index, I found that svimmer's output results were also poor.

Command

$ cat zjj.vcf.list 0006A.delALL.vcf.gz 0032A.delALL.vcf.gz 086596D.delALL.vcf.gz 086598D.delALL.vcf.gz 086599D.delALL.vcf.gz 086600D.delALL.vcf.gz 086601D.delALL.vcf.gz 086604D.delALL.vcf.gz 086606D.delALL.vcf.gz 086620D.delALL.vcf.gz 086627D.delALL.vcf.gz 086639D.delALL.vcf.gz 086643D.delALL.vcf.gz $ svimmer --threads 2 zjj.vcf.list chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM >test.vcf

Output

$ less test.vcf 。。。

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 1 . N 0 . END=10000;SVTYPE=DEL;SVLEN=-10000;IMPRECISE;natorRD=0;natorP1=1.59373e-11;natorP2=6.0566e-44;natorP3=1.99216e-11;natorP4=2.07678e-33;natorQ0=-1;NUM_MERGED_SVS=11;STDDEV_POS=0.00,0.00

Looks like it's missing the sample information.

hannespetur commented 1 year ago

It´s normal that the sample information is missing. svimmer only merges SV sites, genotypes are ignored.

Best, Hannes