fasterius / VarClust

A Python package for clustering of single nucleotide variants from high-through seqencing data.
Other
5 stars 3 forks source link

[StopIteration Error] PyVCF Package #1

Closed saisomesh2594 closed 4 years ago

saisomesh2594 commented 4 years ago

Hey,

I have been trying to implement VarClust on the VCF files for my samples. These VCF files were called using GATK and contain headers. However, running varclust_create_profiles gives a StopIteration error - essentially meaning the end of file has been reached .... I checked the lines which cause the error and it seems to generate from the Reader class in the PyVCF python package... So, I was wondering if there is a specific format of the VCF files which you used (headerless or something else?)

Any help would be highly appreciated!

Thanks

fasterius commented 4 years ago

Hi,

The VCF files analysed by VarClust do have headers, and they have also been generated using GATK (HaplotypeCaller), so I am not sure why you are getting your error. Could you send me a subset of a VCF that gives you the error, so that I may do some troubleshooting?

saisomesh2594 commented 4 years ago

Hi!,

Thanks for your reply.

You can take a look at the vcf files I am using from here

fasterius commented 4 years ago

You are getting the error because your VCFs are malformed: your samples are specified as 0, rather than as 060_S1_snv and 123_S6_snv. The sample in VCF files are specified in the last column, and you will either need to change those to correspond to the filename.

Additionally, since your VCFs are not annotated with snpEff, you will also need to use the flag --method position_only.

fasterius commented 4 years ago

I have now clarified this naming scheme in the documentation, which will hopefully help future users. Thanks for bringing this to my attention!