dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

ipa.vcf_to_hdf5 ValueError: invalid literal for long() with base 10: 'T' #394

Closed TomaszSuchan closed 3 years ago

TomaszSuchan commented 4 years ago

I'm getting this error with ipyrad-0.9.33 when running vcf_to_hdf5 converter:

In [1]: import ipyrad.analysis as ipa
   ...: import pandas as pd
   ...:

In [2]: converter = ipa.vcf_to_hdf5(name="Cardui", data="DP3g50maf05_filtered.FIL.haplotypes.vcfbreakmulti-vcfsnps-vcfbiallelic.nobadloci.vcf.gz")

In [3]: converter.run()
Indexing VCF to HDF5 database file
VCF: 13264 SNPs; 5014 scaffolds
[                    ]   0% 0:00:00 | converting VCF to HDF5 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-e1d471e9c2ab> in <module>()
----> 1 converter.run()

/Users/tomasz/miniconda3/envs/ipyrad/lib/python2.7/site-packages/ipyrad/analysis/vcf_to_hdf5.pyc in run(self)
     73
     74         # fill snps matrix
---> 75         self.build_chunked_matrix()
     76
     77         # report on new database

/Users/tomasz/miniconda3/envs/ipyrad/lib/python2.7/site-packages/ipyrad/analysis/vcf_to_hdf5.pyc in build_chunked_matrix(self)
    194
    195             # get sub arrays
--> 196             genos, snps = chunk_to_arrs(chunkdf, self.nsamples)
    197
    198             # get sub snpsmap

/Users/tomasz/miniconda3/envs/ipyrad/lib/python2.7/site-packages/ipyrad/analysis/vcf_to_hdf5.pyc in chunk_to_arrs(chunkdf, nsamples)
    519     alts3 = np.zeros(alts.size, dtype=np.int8)
    520     lens = np.array([len(i) for i in sas])
--> 521     alts1[lens == 1] = [i[0] for i in sas[lens == 1]]
    522     alts2[lens == 2] = [i[1] for i in sas[lens == 2]]
    523     alts3[lens == 3] = [i[2] for i in sas[lens == 3]]

ValueError: invalid literal for long() with base 10: 'T'
eaton-lab commented 4 years ago

Hi @TomaszSuchan ,

Did you run the recommended pre-filtering steps in the docs? https://ipyrad.readthedocs.io/en/latest/API-analysis/cookbook-vcf2hdf5.html

TomaszSuchan commented 4 years ago

Hi, I tried filtering my file with bcftools as described but I still get the error. The file is from freebayes (dDocent pipeline), with complex variants broken, only balletic SNPs retained.

After the filtering steps it looks clean:

[...]
##contig=<ID=dDocent_Contig_9235>
##contig=<ID=dDocent_Contig_9242>
##contig=<ID=dDocent_Contig_9245>
##contig=<ID=dDocent_Contig_9246>
  * Subsample taxa during branching
dDocent_Contig_5        191     .       G       T       5238.62 .       .       GT      0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/1     0/0     0/0     1/1     0/0     0/0     0/0
     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0
     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
dDocent_Contig_6        181     .       T       C       6171.06 .       .       GT      0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/1     0/0     0/0     0/1     0/1     0/0     0/0     0/1
     0/0     0/0     0/0     0/1     0/0     0/1     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0
     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1
dDocent_Contig_12       27      .       T       C       8543.26 .       .       GT      0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0
     0/0     0/0     0/1     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     ./.     0/0
     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0
[...]
dDocent_Contig_1682     196     .       A       G       6173.82 .       .       GT      0/1     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1
     ./.     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/1     0/0     0/0
     0/0     ./.     0/0     0/0     ./.     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/1     0/0     0/0
     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     ./.
     0/0     0/1     0/0     0/1     0/0     0/1     0/0     0/0     0/0     0/1     0/0     0/0     0/1     0/0     0/0     0/0     0/1     0/1     0/0     0/0
     ./.     0/0     0/0     0/0     0/0     0/0     1/1     1/1     0/0     0/0
[...]
eaton-lab commented 4 years ago

Thanks, I'll look into it, I'm guessing maybe a Py2/3 bug...

TomaszSuchan commented 4 years ago

Yes, good guess - that was in Conda environment with python 2.7, it was left after updating ipyrad. Fresh Conda environment solved the problem, now with python 3.6.7.

isaacovercast commented 3 years ago

+1 python2.7 is officially eol, so we won't worry about fixing this for 2.7 version