alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

ERROR: unexpected error during parser initialization #28

Open Wennie-s opened 2 years ago

Wennie-s commented 2 years ago

Hey, I just run RaiSD and find the error: unexpected error during parser initialization. The running code is /data/user003/soft/RAiSD/raisd-master/bin/release/RAiSD -n Cugi_run -I /wtmp/user003/Reseq/SNP_filter/Chr_vcftools_filter/correct_site/Chr_revised/Chr1.final.vcf.gz -S Cugi_subgroup.txt -y 2 -w 100000 -f. the vcf title is:

fileformat=VCFv4.2

ALT=

FILTER= 60.0 || SOR > 3.0 || MQRankSum < - 12.5 || ReadPosRankSum < -8.0 || QUAL < 30">

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output o.vcf.gz --variant KS-7.g.vcf.gz --intervals Chr1:1-10000 --reference SplitBig.Cugi.all.chr.fasta --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-posteriors-to-calculate-qual false --dont-use-dragstr-priors false --use-new-qual-calculator true --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotype-assignment-method USE_PLS_TO_ASSIGN --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.2.0.0",Date="September 21, 2021 10:17:53 AM CST">

GATKCommandLine=<ID=SelectVariants,CommandLine="SelectVariants --output /wtmp/user003/Reseq/Chr_gatk_filter_extract/Chr2_2.extract.snp.vcf.gz --exclude-filtered true --variant Chr2_2.gatkfilter.vcf.gz --invertSelect false --exclude-non-variants false --preserve-alleles false --remove-unused-alternates false --restrict-alleles-to ALL --keep-original-ac false --keep-original-dp false --mendelian-violation false --invert-mendelian-violation false --mendelian-violation-qual-threshold 0.0 --select-random-fraction 0.0 --remove-fraction-genotypes 0.0 --fully-decode false --max-indel-size 2147483647 --min-indel-size 0 --max-filtered-genotypes 2147483647 --min-filtered-genotypes 0 --max-fraction-filtered-genotypes 1.0 --min-fraction-filtered-genotypes 0.0 --max-nocall-number 2147483647 --max-nocall-fraction 1.0 --set-filtered-gt-to-nocall false --allow-nonoverlapping-command-line-samples false --suppress-reference-path false --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.2.0.0",Date="October 8, 2021 9:55:48 AM CST">

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

contig=

contig=

contig=

contig=

contig= ......

I have checked my vcf title, but can't find any problem. Can you help me?

alachins commented 2 years ago

I see you are providing a .gz file, so first make sure that you have compiled the code with the install-RAiSD-ZLIB.sh script because the standard RAiSD version does not support gz files. If this is the case already, these first vcf lines you provided look fine. Can you provide here some more lines of the vcf file (header line and the first data line) to make sure the header is also there correctly?

On Tue, Nov 2, 2021 at 4:12 AM Wennie-s @.***> wrote:

Hey, I just run RaiSD and find the error: unexpected error during parser initialization. The running code is /data/user003/soft/RAiSD/raisd-master/bin/release/RAiSD -n Cugi_run -I /wtmp/user003/Reseq/SNP_filter/Chr_vcftools_filter/correct_site/Chr_revised/Chr1.final.vcf.gz -S Cugi_subgroup.txt -y 2 -w 100000 -f. the vcf title is:

fileformat=VCFv4.2

ALT=<ID=NON_REF,Description="Represents any possible alternative allele

not already represented at this location by REF and ALT">

FILTER=<ID=Filter,Description="DP < 5 || QD < 2.0 || MQ < 40.0 || FS >

60.0 || SOR > 3.0 || MQRankSum < - 12.5 || ReadPosRankSum < -8.0 || QUAL < 30">

FILTER=

FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the

ref and alt alleles in the order listed">

FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth

(reads with MQ=255 or with bad mates are filtered)">

FORMAT=

FORMAT=

FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed

within the GVCF block">

FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing

haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">

FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID

information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">

FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized,

Phred-scaled likelihoods for genotypes as defined in the VCF specification">

FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically

the position of the first variant in the set)">

FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional

reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">

FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component

statistics which comprise the Fisher's Exact Test to detect strand bias.">

GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output

o.vcf.gz --variant KS-7.g.vcf.gz --intervals Chr1:1-10000 --reference SplitBig.Cugi.all.chr.fasta --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-posteriors-to-calculate-qual false --dont-use-dragstr-priors false --use-new-qual-calculator true --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotype-assignment-method USE_PLS_TO_ASSIGN --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.2.0.0",Date="September 21, 2021 10:17:53 AM CST">

GATKCommandLine=<ID=SelectVariants,CommandLine="SelectVariants --output

/wtmp/user003/Reseq/Chr_gatk_filter_extract/Chr2_2.extract.snp.vcf.gz --exclude-filtered true --variant Chr2_2.gatkfilter.vcf.gz --invertSelect false --exclude-non-variants false --preserve-alleles false --remove-unused-alternates false --restrict-alleles-to ALL --keep-original-ac false --keep-original-dp false --mendelian-violation false --invert-mendelian-violation false --mendelian-violation-qual-threshold 0.0 --select-random-fraction 0.0 --remove-fraction-genotypes 0.0 --fully-decode false --max-indel-size 2147483647 --min-indel-size 0 --max-filtered-genotypes 2147483647 --min-filtered-genotypes 0 --max-fraction-filtered-genotypes 1.0 --min-fraction-filtered-genotypes 0.0 --max-nocall-number 2147483647 --max-nocall-fraction 1.0 --set-filtered-gt-to-nocall false --allow-nonoverlapping-command-line-samples false --suppress-reference-path false --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.2.0.0",Date="October 8, 2021 9:55:48 AM CST">

INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in

genotypes, for each ALT allele, in the same order as listed">

INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each

ALT allele, in the same order as listed">

INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles

in called genotypes">

INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from

Wilcoxon rank sum test of Alt Vs. Ref base qualities">

INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth;

some reads may have been filtered">

INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the

interval">

INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value

for exact test of excess heterozygosity">

INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using

Fisher's exact test to detect strand bias">

INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding

coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">

INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood

expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">

INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood

expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">

INFO=

INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From

Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">

INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality

by Depth">

INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of

squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">

INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from

Wilcoxon rank sum test of Alt vs. Ref read position bias">

INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of

2x2 contingency table to detect strand bias">

contig=

contig=

contig=

contig=

contig= ......

I have checked my vcf title, but can't find any problem. Can you help me?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCTMFYKOTEQROIGZP2LUJ5JJZANCNFSM5HFNT5WA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Nikolaos Alachiotis