MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

RecognitionPotential: invalid literal for int() with base 10 #48

Open tucano opened 2 months ago

tucano commented 2 months ago

Hello! Hope you are well.

I found another small bug in NeoRecoPo.py using a reformatted neoantigens.Indels

Problem stacktrace:

INFO: Begin.
Traceback (most recent call last):
  File "/home/davide.rambaldi/src/NeoPredPipe/StandardPredsClass.py", line 260, in __extractSeq
    pos = int(seq_record.id.replace(";;", ";").split(";")[5]) - 1
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '(position'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/davide.rambaldi/src/NeoPredPipe/NeoRecoPo.py", line 135, in <module>
    main()
  File "/home/davide.rambaldi/src/NeoPredPipe/NeoRecoPo.py", line 98, in main
    preds.ConstructWTFastas()
  File "/home/davide.rambaldi/src/NeoPredPipe/StandardPredsClass.py", line 195, in ConstructWTFastas
    self.__addToFastaFile()
  File "/home/davide.rambaldi/src/NeoPredPipe/StandardPredsClass.py", line 169, in __addToFastaFile
    seqID, seq = self.__extractSeq(sam, fasta_head, epitopeLength)  # WT seqID and seq
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/davide.rambaldi/src/NeoPredPipe/StandardPredsClass.py", line 262, in __extractSeq
    pos = int(seq_record.id.replace(";;", ";").split(";")[6]) - 1
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '10-407'

This is due to this entry in fastaFiles reformat.fasta

>line3751;NM_000537;c.27delC;p.W10Gfs*37;protein-altering;;(position;10-407;changed;from;WGLLLLLWGSCTFGLPTDTTTFKRIFLKRMPSIRESLKERGVDMARLGPEWSQPMKRLTLGNTTSSVILTNYMDTQYYGEIGIGTPPQTFKVVFDT
MDGWRRMPRGDCCCCSGAPVPLVSRQTPPPLNGSSSRECPQSEKA*

In fact if I try this in console it fails:

a = 'line3751;NM_000537;c.27delC;p.W10Gfs*37;protein-altering;;(position;10-407;changed;from;WGLLLLLWGSCTFGLPTDTTTFKRIFLKRMPSIRESLKERGVDMARLGPEWSQPMKRLTLGNTTSSVILTNYMDTQYYGEIGIGTPPQTFKVVFDTGSSNVWVPSSKCSRLYTACVYHKLFDASDSSSYKHNGTELTLRYSTGTVSGFLSQDIITVGGITVTQMFGEVTEMPALPFMLAEFDGVVGMGFIEQAIGRVTPIFDNIISQGVLKEDVFSFYYNRDSENSQSLGGQIVLGGSDPQHYEGNFHYINLIKTGVWQIQMKGVSVGSSTLLCEDGCLALVDTGASYISGSTSSIEKLMEALGAKKRLFDYVVKCNEGPTLPDISFHLGGKEYTLTSADYVFQESYSSKKLCTLAIHAMDIPPPTGPTWALGATFIRKFYTEFDRRNNRIGFALAR*;to;GDCCCCSGAPVPLVSRQTPPPLNGSSSRECPQSEKA*)'

int(a.replace(";;",";").split(";")[6])

ValueError: invalid literal for int() with base 10: '10-407'

If you tell me which of the two integers I should use, I can propose a pull request with a small change in StandardPredsClass.py

Best Regards

Davide