Closed sruthisrini closed 1 year ago
SpliceAI only annotates variants belonging to protein coding genes defined here https://github.com/Illumina/SpliceAI/tree/master/spliceai/annotations by default. If you want to annotate variants outside those regions, then you can provide a custom annotation file (same format as these files). But it looks like you're using precomputed scores as a lookup table, so you will have to run SpliceAI from scratch with the updated annotation (don't know if VEP provides that option).
Thank you for your reply. So, now I tried using the original VCF file (non-annotated) which looks like :
##fileformat=VCFv4.2
##fileDate=20220709
##reference=GRCh38
#CHROM EXON_START EXON_END REF ALT QUAL FILTER INFO
1 100017682 100017682 G A . . .
1 100059878 100059878 G T . . .
1 100133150 100133150 G A . . .
1 100133215 100133215 A T . . .
1 100140182 100140182 G C . . .
This time I didn't use VEP instead I used the splice command directly,
spliceai -I input.vcf -O output.vcf -R hg38.fa -A grch38
It gives this error,
[E::vcf_format] Invalid BCF, CONTIG id=0 not present in the header
Traceback (most recent call last):
File "/Users/sruthisrinivasan/miniconda3/bin/spliceai", line 33, in <module>
sys.exit(load_entry_point('spliceai==1.3.1', 'console_scripts', 'spliceai')())
File "/Users/sruthisrinivasan/miniconda3/lib/python3.10/site-packages/spliceai-1.3.1-py3.10.egg/spliceai/__main__.py", line 75, in main
File "pysam/libcbcf.pyx", line 4482, in pysam.libcbcf.VariantFile.write
File "pysam/libcbcf.pyx", line 4519, in pysam.libcbcf.VariantFile.write
OSError: [Errno 22] b'Invalid argument'
Am I missing anything here?
You need to add the contigs to the header. See the example file https://github.com/Illumina/SpliceAI/blob/master/examples/input.vcf That file contains the contigs for hg19, but you can replace it with the corresponding one for hg38.
While generating the predictions using spliceAI from vep, I am not getting any values. I used this command to generate the output,
/vep -i input.vcf -o output.vcf --cache --force_overwrite --tab --no_check_variants_order --force --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
Below, are the examples of entries in the output file.
Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation IMPACT DISTANCE STRAND FLAGS SpliceAI_pred
100017682 chr1:100017682 A ENSG00000117620 ENST00000370153 Transcript missense_variant,splice_region_variant 981 880 294 A/T Gca/Aca - MODERATE - - 100017682 chr1:100017682 A ENSG00000117620 ENST00000370155 Transcript splice_region_variant,3_prime_UTR_variant,NMD_transcript_variant 882 - - - - LOW - 1 - -
Is there any wrong in the input file? Could anyone help here?