MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
105 stars 28 forks source link

NeoRecoPo.py IndexError #35

Closed XiangweiZhai closed 9 months ago

XiangweiZhai commented 2 years ago

HI: I want to use NeoRecoPo.py to get Neoantigen Recognition Potential. I used the data you provided to run the project. step1: /usr/bin/python NeoPredPipe.py -I ./Example/input_vcfs -H ./Example/HLAtypes/hlatypes.txt -o ~/neotest/ -n TestRun -c 1 2 -E 8 9 10 Four files containing neoantigen information were obtained successfully: TestRun.neoantigens.Indels.summarytable.txt
TestRun.neoantigens.Indels.txt
TestRun.neoantigens.summarytable.txt
TestRun.neoantigens.txt) step2: /usr/bin/python NeoPredPipe.py --preponly -I ./Example/input_vcfs -H ./Example/HLAtypes/hlatypes.txt -o ~/neopre/ -n TestRun -c 1 2 -E 8 9 10 I got four files : avannotated
avready
fastaFiles
tmp step3: /usr/bin/python NeoRecoPo.py --neopred_in=$HOME/neotest/TestRun.neoantigens.txt --neoreco_out=$HOME/neoar --fastas=$HOME/neopre/fastaFiles ERROR: INFO: Begin.
Traceback (most recent call last): File "NeoRecoPo.py", line 135, in main() File "NeoRecoPo.py", line 98, in main preds.ConstructWTFastas() File "/home/xiangwei/softwares/NeoPredPipe/StandardPredsClass.py", line 195, in ConstructWTFastas self.addToFastaFile() File "/home/xiangwei/softwares/NeoPredPipe/StandardPredsClass.py", line 169, in addToFastaFile seqID, seq = self.__extractSeq(sam, fasta_head, epitopeLength) # WT seqID and seq File "/home/xiangwei/softwares/NeoPredPipe/StandardPredsClass.py", line 269, in __extractSeq WTepiSeq = ExtractSeq(WT[1], pos, epitopeLength) IndexError: list index out of range input files: TestRun.neoantigens.txt(top 5 rows) TestRun.neoantigens.txt files in $HOME/neopre/fastaFiles: test1.fasta test1.reformat.fasta test2.fasta test2.reformat.fasta anythings wrong?

elakatos commented 1 year ago

Hi! Sorry for the very delayed reply, I have been away for most of the year and just checking up on the outstanding issues. I hope you managed to move on from this issue in the meantime, but hopefully it will be beneficial for others to know what went wrong here.

If I understand correctly, you first created the neoantigen prediction outputs, without keeping any intermediate files, and then in the next step, you created these intermediate files only. Following these steps, I managed the recreate the error. The issue is that there are small discrepancies between the fasta files and the neoantigen table: it is because certain genes have multiple transcripts, and it can vary slightly which transcript Annovar uses to translate the mutations into a protein sequence - meaning between different runs, there will be slight differences in the fasta files. This typically doesn't mean a difference in the predicted peptide sequence, but the sequence might be at amino acid 22 in one transcript and in 42 in another - hence the indexing error we run into. To make sure this doesn't happen, always ensure that your fasta files are not "newer" than your neoantigen predictions: change the order you run steps 1 and 2, or simply run step 1 with the option -d to keep intermediate files.