Illumina / SpliceAI

A deep learning-based tool to identify splice variants
Other
409 stars 159 forks source link

spliceAI not giving output value while running using vep (Variant Ensemble Predictor) #133

Closed sruthisrini closed 1 year ago

sruthisrini commented 1 year ago

While generating the predictions using spliceAI from vep, I am not getting any values. I used this command to generate the output,

/vep -i input.vcf -o output.vcf --cache --force_overwrite --tab --no_check_variants_order --force --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz

Below, are the examples of entries in the output file.

Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation IMPACT DISTANCE STRAND FLAGS SpliceAI_pred

100017682 chr1:100017682 A ENSG00000117620 ENST00000370153 Transcript missense_variant,splice_region_variant 981 880 294 A/T Gca/Aca - MODERATE - - 100017682 chr1:100017682 A ENSG00000117620 ENST00000370155 Transcript splice_region_variant,3_prime_UTR_variant,NMD_transcript_variant 882 - - - - LOW - 1 - -

Is there any wrong in the input file? Could anyone help here?

kishorejaganathan commented 1 year ago

SpliceAI only annotates variants belonging to protein coding genes defined here https://github.com/Illumina/SpliceAI/tree/master/spliceai/annotations by default. If you want to annotate variants outside those regions, then you can provide a custom annotation file (same format as these files). But it looks like you're using precomputed scores as a lookup table, so you will have to run SpliceAI from scratch with the updated annotation (don't know if VEP provides that option).

sruthisrini commented 1 year ago

Thank you for your reply. So, now I tried using the original VCF file (non-annotated) which looks like :

##fileformat=VCFv4.2
##fileDate=20220709
##reference=GRCh38
#CHROM EXON_START EXON_END  REF ALT QUAL    FILTER  INFO
1   100017682   100017682   G   A   .   .   .   
1   100059878   100059878   G   T   .   .   .   
1   100133150   100133150   G   A   .   .   .   
1   100133215   100133215   A   T   .   .   .   
1   100140182   100140182   G   C   .   .   .   

This time I didn't use VEP instead I used the splice command directly, spliceai -I input.vcf -O output.vcf -R hg38.fa -A grch38 It gives this error,

[E::vcf_format] Invalid BCF, CONTIG id=0 not present in the header
Traceback (most recent call last):
  File "/Users/sruthisrinivasan/miniconda3/bin/spliceai", line 33, in <module>
    sys.exit(load_entry_point('spliceai==1.3.1', 'console_scripts', 'spliceai')())
  File "/Users/sruthisrinivasan/miniconda3/lib/python3.10/site-packages/spliceai-1.3.1-py3.10.egg/spliceai/__main__.py", line 75, in main
  File "pysam/libcbcf.pyx", line 4482, in pysam.libcbcf.VariantFile.write
  File "pysam/libcbcf.pyx", line 4519, in pysam.libcbcf.VariantFile.write
OSError: [Errno 22] b'Invalid argument'

Am I missing anything here?

kishorejaganathan commented 1 year ago

You need to add the contigs to the header. See the example file https://github.com/Illumina/SpliceAI/blob/master/examples/input.vcf That file contains the contigs for hg19, but you can replace it with the corresponding one for hg38.