bmvdgeijn / WASP

WASP: allele-specific pipeline for unbiased read mapping and molecular QTL discovery
Apache License 2.0
103 stars 51 forks source link

snp2h5 gives ERROR: snptab.c:126: failed to write record to SNP table #65

Closed mce1 closed 7 years ago

mce1 commented 7 years ago

Hi there,

using WASP/snp2h5 I have just encountered the folowing error which I don't seem to be able to make sense of:


[mce@dzne-go-cn03 test]$ /home/mce/WASP/snp2h5/snp2h5 --chrom /home/mce/genomes/mouse/mm10_chromInfo.txt --format vcf --haplotype h5_files/haplotypes.h5 --snp_index h5_files/snp_index.h5 --snp_tab h5_files/snp_tab.h5 chromosome_vcf_files/C57BL_6JxCAST_EiJ_chr*.vcf.gz
long alleles will be truncated to 100bp
writing haplotypes to: h5_files/haplotypes.h5
writing SNP index to: h5_files/snp_index.h5
writing SNP table to: h5_files/snp_tab.h5
chromosome: chr10, length: 130694993bp
reading from file chromosome_vcf_files/C57BL_6JxCAST_EiJ_chr10.vcf.gz
counting lines in file
  total lines: 1269364
reading VCF header
  VCF header lines: 63
  number of samples: 1
initializing HDF5 matrix with dimension: (1269301, 2)
parsing file and writing to HDF5 files
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 140597336254272:
  #000: H5Tcompound.c line 254 in H5Tget_member_type(): unable register datatype atom
    major: Datatype
    minor: Unable to register new atom
  #001: H5I.c line 895 in H5I_register(): can't insert ID node into skip list
    major: Object atom
    minor: Unable to insert object
  #002: H5SL.c line 995 in H5SL_insert(): can't create new skip list node
    major: Skip Lists
    minor: Unable to insert object
  #003: H5SL.c line 687 in H5SL_insert_common(): can't insert duplicate key
    major: Skip Lists
    minor: Unable to insert object
ERROR: snptab.c:126: failed to write record to SNP table

Here is the beginning of the vcf file:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=StrandBias,Description="Min P-value for strand bias (INFO/PV4) [0.0001]">
##FILTER=<ID=EndDistBias,Description="Min P-value for end distance bias (INFO/PV4) [0.0001]">
##FILTER=<ID=MaxDP,Description="Maximum read depth (INFO/DP or INFO/DP4) [250]">
##FILTER=<ID=BaseQualBias,Description="Min P-value for baseQ bias (INFO/PV4) [0]">
##FILTER=<ID=MinMQ,Description="Minimum RMS mapping quality for SNPs (INFO/MQ) [20]">
##FILTER=<ID=MinAB,Description="Minimum number of alternate bases (INFO/DP4) [5]">
##FILTER=<ID=Qual,Description="Minimum value of the QUAL field [10]">
##FILTER=<ID=VDB,Description="Minimum Variant Distance Bias (INFO/VDB) [0]">
##FILTER=<ID=GapWin,Description="Window size for filtering adjacent gaps [3]">
##FILTER=<ID=MapQualBias,Description="Min P-value for mapQ bias (INFO/PV4) [0]">
##FILTER=<ID=SnpGap,Description="SNP within INT bp around a gap to be filtered [2]">
##FILTER=<ID=RefN,Description="Reference base is N []">
##FILTER=<ID=MinDP,Description="Minimum read depth (INFO/DP or INFO/DP4) [5]">
##FILTER=<ID=Het,Description="Genotype call is heterozygous (low quality) []">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=GP,Number=G,Type=Float,Description="Phred-scaled genotype posterior probabilities">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##FORMAT=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of high-quality non-reference bases">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-fwd, ref-reverse, alt-fwd and alt-reverse bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##FORMAT=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##FORMAT=<ID=FI,Number=1,Type=Integer,Description="Whether a sample was a Pass(1) or fail (0) based on FILTER values">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Total Number of high-quality ref-fwd, ref-reverse, alt-fwd and alt-reverse bases">
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type from Ensembl 78 as predicted by VEP. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino
_acids|Codons|Existing_variation|DISTANCE|STRAND">
##contig=<ID=1,length=195471971>
##contig=<ID=10,length=130694993>
##contig=<ID=11,length=122082543>
##contig=<ID=12,length=120129022>
##contig=<ID=13,length=120421639>
##contig=<ID=14,length=124902244>
##contig=<ID=15,length=104043685>
##contig=<ID=16,length=98207768>
##contig=<ID=17,length=94987271>
##contig=<ID=18,length=90702639>
##contig=<ID=19,length=61431566>
##contig=<ID=2,length=182113224>
##contig=<ID=3,length=160039680>
##contig=<ID=4,length=156508116>
##contig=<ID=5,length=151834684>
##contig=<ID=6,length=149736546>
##contig=<ID=7,length=145441459>
##contig=<ID=8,length=129401213>
##contig=<ID=9,length=124595110>
##contig=<ID=MT,length=16299>
##contig=<ID=X,length=171031299>
##contig=<ID=Y,length=91744698>
##ALT=<ID=X,Description="Represents allele(s) other than observed.">
##samtoolsVersion=1.1+htslib-1.1
##bcftools_callVersion=1.1+htslib-1.1
##reference=ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa
##source_20141009.1=vcf-annotate(r953) -f +/D=400/d=5/q=20/w=2/a=5/ (BTBR_T+_Itpr3tf_J,ST_bJ)
##QUAL=<ID=QUAL,Number=1,Type=Float,Description="The highest QUAL value for a variant location from any of the samples.">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  C57BL_6JxCAST_EiJ
10      3101242 rs387754637     T       TAA     94      PASS    .       GT      0|1
10      3101362 rs33880920      G       C       228     PASS    .       GT      0|1
10      3102112 rs29321579      T       C       228     PASS    .       GT      0|1
10      3102661 rs29370426      C       G       228     PASS    .       GT      0|1
10      3103479 rs108493055     A       G       228     PASS    .       GT      0|1
10      3103831 rs33849371      T       C       228     PASS    .       GT      0|1
10      3103836 rs584544245     T       C       227     PASS    .       GT      0|1
10      3103895 rs29364631      T       C       228     PASS    .       GT      0|1
10      3103925 rs578285186     T       C       228     PASS    .       GT      0|1
10      3104351 rs387537244     A       AC      39.4943 PASS    .       GT      0|1
10      3104745 rs241787200     A       G       228     PASS    .       GT      0|1
10      3104903 rs29350251      A       G       228     PASS    .       GT      0|1
10      3104909 rs580341507     G       A       228     PASS    .       GT      0|1
10      3105426 rs108035459     T       C       228     PASS    .       GT      0|1
10      3105799 rs234511968     A       C       201     PASS    .       GT      0|1
10      3105800 rs217261475     A       T       190     PASS    .       GT      0|1
10      3105801 rs245173074     C       T       217     PASS    .       GT      0|1
10      3105856 rs579919330     A       G       228     PASS    .       GT      0|1
10      3105858 rs583575549     A       G       228     PASS    .       GT      0|1
10      3106780 rs581394925     A       G       228     PASS    .       GT      0|1
10      3106917 rs256461417     C       T       195     PASS    .       GT      0|1
10      3107123 rs223250688     A       G       228     PASS    .       GT      0|1
10      3107640 rs29345216      T       A       228     PASS    .       GT      0|1
10      3107674 rs240873984     A       G       228     PASS    .       GT      0|1
10      3107749 rs256303459     C       T       228     PASS    .       GT      0|1
10      3107904 rs246030398     A       G       228     PASS    .       GT      0|1
10      3107917 rs233488944     A       G       228     PASS    .       GT      0|1
10      3107937 rs262503241     T       C       228     PASS    .       GT      0|1
10      3108077 rs218305584     A       C       228     PASS    .       GT      0|1
10      3108112 rs29378080      T       C       228     PASS    .       GT      0|1
10      3108231 rs29346229      C       T       228     PASS    .       GT      0|1
10      3108450 rs247930443     T       C       228     PASS    .       GT      0|1
10      3108510 rs387013722     A       AACAGACACACAC   115     PASS    .       GT      0|1
10      3108855 rs107696935     G       C       228     PASS    .       GT      0|1
10      3108985 rs223682752     A       T       228     PASS    .       GT      0|1
10      3109160 rs578781673     T       C       195     PASS    .       GT      0|1
10      3109188 rs587296554     G       C       199     PASS    .       GT      0|1
10      3109189 rs583582949     A       C       201     PASS    .       GT      0|1
10      3116973 rs587319011     C       A       76      PASS    .       GT      0|1
10      3117386 rs49605811      T       C       228     PASS    .       GT      0|1
10      3117445 rs266123009     C       CCA     228     PASS    .       GT      0|1
10      3117578 rs48009659      C       T       228     PASS    .       GT      0|1
10      3118041 rs262864509     T       A       228     PASS    .       GT      0|1
10      3118135 rs387629506     G       GT      228     PASS    .       GT      0|1
10      3118156 rs29328552      G       A       228     PASS    .       GT      0|1
10      3118664 rs235884734     C       A       228     PASS    .       GT      0|1

By the time the error is raised 4.5 MB have been written to haplotypes.h5.

Does anyone know what that error means and what to do about it? Any help will be much appreciated.

Thanks,

mce

mce1 commented 7 years ago

Could be resolved by upgrading PyTables to 3.2.0 and recompiling snp2h5.