MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

NeoRecoPo.py error #21

Closed xicola7 closed 3 years ago

xicola7 commented 3 years ago

Hi developers, first of all thank you for putting together this resource.

I am trying to run the NeoRecoPo.py and I am getting some errors. Before detailing those errors, I just want to make sure that I am providing the right options.

--neopred_in= this is the file "neoantigens.Indels.txt" or ".neoantigens.txt" --neoreco_out= this is the directory where to create the output --fastas= this is the fastaFiles directory that was created when running NeoPredPipe.py

can you please confirm.

Many thanks Rosa

elakatos commented 3 years ago

Hi Rosa, Your options look good, yes. The only problem I could imagine with the options is if the file path are not relative to where you are running the script - so just to be on the safe side I would advise to provide the absolute path and then you cannot go wrong. Also, if running NeoRecoPo.py on Indel predictions, besides providing the "*.neoantigens.Indels.txt" file, also use the option --indel, otherwise part of the recognition potential prediction might be incorrect or throw errors.

Let me know if the errors still persist!

Best, Eszter

xicola7 commented 3 years ago

Hi Eszter, thanks for a such a quick response. I did that and this is what I get:

(neopred)[rmx2@c16n12 NeoPred]$ NeoRecoPo.py --neopred_in=/home/rmx2/scratch60/NeoPred/Clu2.neoantigens.Indels.txt --neoreco_out=/home/rmx2/scratch60/NeoPred/ --fastas=/home/rmx2/scratch60/NeoPred/fastaFiles/ --indel INFO: Begin. INFO: Temporary class file found (neorecopo.p), loading previously processed data. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/860909.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/CE013124.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/M38766.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/5401.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/23470H.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/13115.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/441.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/S16-32637.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/13113.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/6141.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/13120.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/PV19.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/13116.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/PC1101.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/13114.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/13122.readyForBlastp.fasta. Nothing to be done. INFO: Found blastp results /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results/860914.readyForBlastp.fasta. Nothing to be done. Traceback (most recent call last): File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/bin/NeoRecoPo.py", line 135, in main() File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/bin/NeoRecoPo.py", line 124, in main preds.PerformCalculations(tmpOut, Options) File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/NeoPredPipe/StandardPredsClass.py", line 487, in PerformCalculations aligner.readAllBlastAlignments(xmlpath) File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/NeoPredPipe/NeoAlign.py", line 71, in readAllBlastAlignments for brecord in blast_records: File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/lib/python2.7/site-packages/Bio/Blast/NCBIXML.py", line 798, in parse raise ValueError("Your XML file was empty") ValueError: Your XML file was empty

elakatos commented 3 years ago

Hi Rosa, Thanks for sending this over. It seems to me that the blastp prediction files are empty and that's what's causing an issue. This might be from a previous run that didn't run properly, as right now you used previous blastp files in the temporary directory. So first of all, can you try deleting the directory /home/rmx2/scratch60/NeoPred/NeoRecoTMPIndels/blastp_results , and also the "neorecopo.p" file in the NeoRecoTMPIndels folder (if present) - and running the predictions again? If you get the same error, my next tip would be to check if the xml files in the blastp_results folder are empty and if blastp is set up and running properly. If it ran okay, you'd expect to see for example a part starting with "< Hit >" in the xml file that stands for the output hits of the algorithm.

Best, Eszter

xicola7 commented 3 years ago

Hey, I rerun getting rid of the older files, in the blastp_results directory, the xlm are empty as you suggested and the fasta ForBlastp are there and not empty. Does this mean that the blastp might not be running properly? rosa

elakatos commented 3 years ago

Hi Rosa, Yes, that would be my guess given that the fastas are not empty but the xmls are. Eszter

xicola7 commented 3 years ago

I will look into it and let you know many thanks rosa

xicola7 commented 3 years ago

Hi Eszter, the problem relates to the fasta files. For NeoRecoPo.py you need to detail "" Which are these fasta files? are these the fastfiles created after the -preponly step? let me know thanks rosa

elakatos commented 3 years ago

It uses the fasta files called *.reformat.fasta in the fastaFiles directory (automatically reads these in from the fastaFiles directory supplied as the input argument).

AC1-2020 commented 3 years ago

I have tried to perform a test run with the example files provided, following the instructions (https://github.com/MathOnco/NeoPredPipe)

cd NeoPredPipe python ./NeoPredPipe.py --help

Run the Pipeline to only prepare the input files. Can be best to run this independently if working on a cluster.

python NeoPredPipe.py --preponly -I ./Example/input_vcfs -H ./Example/HLAtypes/hlatypes.txt -o ./ -n TestRun -c 1 2 -E 8 9 10

18 #terminal:

INFO: Annovar reference files of build hg19 were given, using this build for all analysis. INFO: Begin. INFO: Proper directory already exists. Continue. INFO: Proper directory already exists. Continue. INFO: Proper directory already exists. Continue. INFO: Proper directory already exists. Continue. INFO: ANNOVAR Ready files for test2 already present. INFO: ANNOVAR Annotation files for test2 already present. INFO: Coding change fasta files for test2 already present. INFO: Coding change fasta files test2 has already been reformatted. INFO: Input files prepared and completed for test2 INFO: ANNOVAR Ready files for test1 already present. INFO: ANNOVAR Annotation files for test1 already present. INFO: Coding change fasta files for test1 already present. INFO: Coding change fasta files test1 has already been reformatted. INFO: Input files prepared and completed for test1 INFO: Complete. INFO: Preprocessed intermediary files are in avready, avannotated and fastaFiles. If you wish to perform epitope prediction, run the pipeline again without the --preponly flag, intermediary files will be automatically detected.

Run the Pipeline

python NeoPredPipe.py -I ./Example/input_vcfs -H ./Example/HLAtypes/hlatypes.txt -o ./OUTDIR -n TestRun -c 1 2 -E 8 9 10

INFO: Annovar reference files of build hg19 were given, using this build for all analysis. INFO: Begin. INFO: Proper directory already exists. Continue. INFO: Proper directory already exists. Continue. INFO: Proper directory already exists. Continue. INFO: Proper directory already exists. Continue. INFO: ANNOVAR Ready files for test2 already present. INFO: ANNOVAR Annotation files for test2 already present. INFO: Coding change fasta files for test2 already present. INFO: Coding change fasta files test2 has already been reformatted. INFO: Predicting neoantigens for test2 INFO: Skipping Sample! No peptides to predict for test2 INFO: Running Epitope Predictions for test2 on epitopes of length 9 Traceback (most recent call last): File "NeoPredPipe.py", line 524, in main() File "NeoPredPipe.py", line 505, in main t.append(Sample(localpath, patname, patFile, hlas[patname], annPaths, netMHCpanPaths, pepmatchPaths, Options)) File "NeoPredPipe.py", line 106, in init self.callNeoantigens(FilePath, netmhcpan, Options) File "NeoPredPipe.py", line 170, in callNeoantigens self.epcalls = predict_neoantigens(Options.OutputDir, self.patID, self.peptideFastas, self.hlasnormed , Options.epitopes, netmhcpan, Options) File "/data/sata_data/home/priyadarshi_p1/NeoPredPipe/predict_binding.py", line 49, in predict_neoantigens netMHC_run = subprocess.Popen(cmd, stdout=epitope_pred, stderr=epitope_pred) File "/usr/lib64/python2.7/subprocess.py", line 711, in init errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

CAN YOU KINDLY HELP IN RESOLVING THE ERROR?

elakatos commented 3 years ago

Hi! It looks like the software netMHCpan is not set up properly or the path to the software is not correct in usr_paths.ini. I would suggest to double check that and to input the absolute path in usr_paths.ini to be on the safe side. Also, just to avoid any problems leaking from a previous run, delete the intermediate folders (avready, avannotated, fastaFiles, tmp) after every run with an error before attempting a new run. Finally, in the second command (running predictions), you should change "-o ./OUTDIR" to wherever you want your output to be stored - for example to "-o ./" following the previous command.

Best, Eszter

xicola7 commented 3 years ago

Hi Eszter, does this reformat.fasta looks good to you? The problem is that the readyForBlast files are empty (see below) Any idea? r

reformat.fasta

line2;NM_001367552;WILDTYPE; MGNSHCVPQAPRRLRASFSRKPSLKGNREDSARMSAGLPGPEAARSGDAAANKLFHYIPGTDILDLENQRENLEQPFLSVFKKGRRRVPVRNLGKVVHYA KVQLRFQHSQDVSDCYLELFPAHLYFQAHGSEGLTFQGLLPLTELSVCPLEGSREHAFQITGPLPAPLLVLCPSRAELDRWLYHLEKQTALLGGPRRCHS APPQGSCGDELPWTLQRRLTRLRTASGHEPGGSAVCASRVKLQHLPAQEQWDRLLVLYPTSLAIFSEELDGLCFKGELPLRAVHINLEEKEKQIRSFLIE GPLINTIRVVCASYEDYGHWLLCLRAVTHREGAPPLPGAESFPGSQVMGSGRGSLSSGGQTSWDSGCLAPPSTRTSHSLPESSVPSTVGCSSQHTPDQAN SDRASIGRRRTELRRSGSSRSPGSKARAEGRGPVTPLHLDLTQLHRLSLESSPDAPDHTSETSHSPLYADPYTPPATSHRRVTDVRGLEEFLSAMQSARG PTPSSPLPSVPVSVPASDPRSCSSGPAGPYLLSKKGALQSRAAQRHRGSAKDGGPQPPDAPQLVSSAREGSPEPWLPLTDGRSPRRSRDPGYDHLWDETL SSSHQKCPQLGGPEASGGLVQWI line2;NM_001367552;c.G254A;p.R85Q;protein-altering;;(position;85;changed;from;R;to;Q) MGNSHCVPQAPRRLRASFSRKPSLKGNREDSARMSAGLPGPEAARSGDAAANKLFHYIPGTDILDLENQRENLEQPFLSVFKKGQRRVPVRNLGKVVHYA KVQLRFQHSQDVSDCYLELFPAHLYFQAHGSEGLTFQGLLPLTELSVCPLEGSREHAFQITGPLPAPLLVLCPSRAELDRWLYHLEKQTALLGGPRRCHS APPQGSCGDELPWTLQRRLTRLRTASGHEPGGSAVCASRVKLQHLPAQEQWDRLLVLYPTSLAIFSEELDGLCFKGELPLRAVHINLEEKEKQIRSFLIE GPLINTIRVVCASYEDYGHWLLCLRAVTHREGAPPLPGAESFPGSQVMGSGRGSLSSGGQTSWDSGCLAPPSTRTSHSLPESSVPSTVGCSSQHTPDQAN SDRASIGRRRTELRRSGSSRSPGSKARAEGRGPVTPLHLDLTQLHRLSLESSPDAPDHTSETSHSPLYADPYTPPATSHRRVTDVRGLEEFLSAMQSARG PTPSSPLPSVPVSVPASDPRSCSSGPAGPYLLSKKGALQSRAAQRHRGSAKDGGPQPPDAPQLVSSAREGSPEPWLPLTDGRSPRRSRDPGYDHLWDETL SSSHQKCPQLGGPEASGGLVQWI line3;NM_004421;WILDTYPE; MAETKIIYHMDEEETPYLVKLPVAPERVTLADFKNVLSNRPVHAYKFFFKSMDQDFGVVKEEIFDDNAKLPCFNGRVVSWLVLAEGAHSDAGSQGTDSHT DLPPPLERTGGIGDSRPPSFHPNVASSRDGMDNETGTESMVSHRRERARRRNREEAARTNGHPRGDRRRDVGLPPDSASTALSSELESSSFVDSDEDGST SRLSSSTEQSTSSRLIRKHKRRRRKQRLRQADRASSFSSITDSTMSLNIVTVTLNMERHHFLGISIVGQSNDRGDGGIYIGSIMKGGAVAADGRIEPGDM LLQVNDVNFENMSNDDAVRVLREIVSQTGPISLTVAKCWDPTPRSYFTVPRADPVRPIDPAAWLSHTAALTGALPRYELEEAPLTVKSDMSAVVRVMQLP DSGLEIRDRMWLKITIANAVIGADVVDWLYTHVEGFKERREARKYASSLLKHGFLRHTVNKITFSEQCYYVFGDLCSNLATLNLNSGSSGTSDQDTLAPL PHPAAPWPLGQGYPYQYPGPPPCFPPAYQDPGFSYGSGSTGSQQSEGSKSSGSTRSSRRAPGREKERRAAGAGGSGSESDHTAPSGVGSSWRERPAGQLS RGSSPRSQASATAPGLPPPHPTTKAYTVVGGPPGGPPVRELAAVPPELTGSRQSFQKAMGNPCEFFVDIM*

readyForBlast

13113|0.672|MT|12121 0 13113|0.384|MT|12122 0 13113|1.742|MT|12123 0 13113|1.54|MT|12124 0

elakatos commented 3 years ago

Hi, The reformatted fasta looks good as much as I can tell from the copy in here (there should be a > in the beginning of each header line, but that might be just a problem of the copying?). But in the readyForBlast file, even the headers of each fasta line seem to have problems - it should contain something like line1_NMXXX. The problem could come from the neoantigen table file supplied into the NeoRecoPo step, so check if that seems correct. Can you let me know of the versions of the software (netMHCpan, annovar) you are using? Also, there should be a file called "Neoantigens.WTandMTtable.txt" which is used to generate the readyForBlastp fastas - how does that look?

Best, Eszter

xicola7 commented 3 years ago

Each line in the reformatted fasta starts with >, that is good.

Versions: netMHCpan-4.1 ANNOVAR: The perl files are dated from 24Octb2019 and June2020, thus the latest version

The neoantigen table:

[rmx2@ruddle1 NeoPred]$ head Clu2.neoantigens.Indels.txt
13120 0 -1 line86 2 120979531 G - TMEM185B:NM_024121 4 HLA-A24:21 KYVPPLPSX KYVPPLPSX 0 0 KYVPPLPSX line86_NM_02412 0.295797 0.553 0.215521 2.782 4855.61 <= WB 13120 0 -1 line132 3 105250904 A - "ALCAM:NM_001243280,ALCAM:NM_001243281,ALCAM:NM_001627" 6 HLA-A24:21 QLKSWVTAF QLKSWVTAF 0 0 0 0 0 QLKSWVTAF line132_NM_0012 0.07805 1.449 0.214144 2.812 4928.49 <= WB 13120 0 -1 line132 3 105250904 A - "ALCAM:NM_001243280,ALCAM:NM_001243281,ALCAM:NM_001627" 13 HLA-A24:21 AFQKTVIQM AFQKTVIQM 0 0 0 0 0 AFQKTVIQM line132_NM_0012 0.189898 0.798 0.189789 3.418 6414.42 <= WB 13120 0 -1 line132 3 105250904 A - "ALCAM:NM_001243280,ALCAM:NM_001243281,ALCAM:NM_001627" 32 HLA-A24:21 CYIPLKERW CYIPLKERW 0 0 0 0 0 CYIPLKERW line132_NM_0012 0.85276 0.074 0.615412 0.139 64.15 <= SB 13120 0 -1 line179 5 114572094 CTT - PGGT1B:NM_005023 38 HLA-A24:21 LMGKLEEVF LMGKLEEVF 0 0 LMGKLEEVF line179_NM_0050 0.063067 1.627 0.296817 1.531 2014.82 <= WB 13120 0 -1 line179 5 114572094 CTT - PGGT1B:NM_005023 80 HLA-A24:21 FWVGATLKL FWVGATLKL 0 0 FWVGATLKL line179_NM_0050 0.227747 0.689 0.339242 1.137 1273.16 <= WB 13120 0 -1 line179 5 114572094 CTT - PGGT1B:NM_005023 110 HLA-A24:21 RLVGGFAKW RLVGGFAKW 0 0 RLVGGFAKW line179_NM_0050 0.222856 0.7 0.395182 0.769 695.06 <= WB 13120 0 -1 line179 5 114572094 CTT - PGGT1B:NM_005023 121 HLA-A24:21 SHPDALHAY SHPDALHAY 0 0 SHPDALHAY line179_NM_0050 0.109465 1.189 0.115398 7.285 14345.59 <= WB 13120 0 -1 line179 5 114572094 CTT - PGGT1B:NM_005023 129 HLA-A24:21 YFGICGLSL YFGICGLSL 0 0 YFGICGLSL line179_NM_0050 0.048925 1.871 0.327352 1.235 1447.95 <= WB 13120 0 -1 line179 5 114572094 CTT - PGGT1B:NM_005023 159 HLA-A24:21 RLLDLHQSW RLLDLHQSW 0 0 RLLDLHQSW line179_NM_0050 0.466094 0.34 0.404722 0.719 626.89 <= SB

this is the WTandMT.table

[rmx2@ruddle1 NeoRecoTMPIndels]$ head Neoantigens.WTandMTtable.txt ID MUTATION_ID Sample WT.PEPTIDE MT.PEPTIDE MT.ALLELE WT.SCORE MT.SCORE HLA CHOP_SCORE 1 0.553 13120 - 0 KYVPPLPSX 1000.0 2.782 "KYVPPLPSX,QLKSWVTAF,AFQKTVIQM,CYIPLKERW,LMGKLEEVF,FWVGATLKL,RLVGGFAKW,SHPDALHAY,YFGICGLSL,RLLDLHQSW,LNWKRNLLF,SFTIHCSIF,IHCSIFSLF,SIFSLFILF,IWICSPPHL,ALMPLLLSL,LLWPHKITI,TYLPNHPQA,YLPNHPQAL,SHGPGIRLF,RLFNLTSTF,RHSDYPLSL,SDYPLSLQW,DYPLSLQWL,QWLPGTAYL,LAFLSRYKF,DWMPNNHSV,ESPQKLAEF,EWGVNDPLL,LLPNYLNGF,NYLNGFECF,ASPGDSPVF,KFWDYLHEI,FWDYLHEIF,IFMKRQHLX,QLPPQALAL,PQIPPLILL,TLQLVQILW,LWSKLELAL,HTTRWPTTW,ARATTPPTW,TWRASSARW,APGPPTALF,LFLQNQKVW,AWPSCLRRM,MQRAEFAKF,EFAKFALML,AWTRDLALL,LMMALPWVW,ALPWVWLTF,KGYSNRLYF,GYSNRLYFV,RTRDETYIW,YIWEKITDF,VYKDRLIYF,RHSELQDCF,CFDVHDASW,GWHNDVHIF,IFDTKTQTW,VLGNKGYIF,GYIFGGRVL,LHYLNLDTW,YLNLDTWTW,IHNVTTNCW,HLPKTRPRL,IFQTQPYSL,FQTQPYSLL,PYSLLRSCL,IMLESQISL,QQVLKKITF,QVLKKITFW,KYQWISSNX,KLARNGVFW,ARNGVFWHL,FWHLNWKTL,GWTAPPRAL,RWTAPLRPL,SRLPGPPTL,PLPPEAPLL,PLLLTGPHW,NLPPSPSSL,VIPTRTARW,PWPSSQPVW,SSQPVWKHL,SQPVWKHLI,KHLIYTPLL,TWPPAQSTW,TWSPCSPSM,VVMWRFWRL,FWRMVSMSL,SSPENPLLM,FKEKKLQRM,GRRRPALRL,RRRPALRLL,RRPALRLLC,PRPGAAASL,QRVEGARTH,TMWDTIKMM,KKIEALTAL,KAITYIRSM,ITYIRSMSY,NRIKRWCIM,KPVDTCYSF,YSFWVGATL,LKLLKIFQY,TNFEKNRNY,WPDSHPDAL,HPDALHAYF,FLYKSKKTL,YKSKKTLNW,RGKPPPHPL,LPHSPAATL,ICSPPHLSL,LMSPITLKV,TATPTAHTV,ASKARPVLL,ARPVLLWPH,TINLSPVTY,NHPQALLAL,STFSRHSDY,LQWLPGTAY,LRRPVPPPM,RRRGYAPLL,RRGYAPLLY,RGYAPLLYL,DRDRYVREL,MRHIPVDSY,NRELPTARL,TEDPELLAF,FLSRYKFHL,SRYKFHLAL,MHLGAVPVY,VRDWMPNNH,MPNNHSVIL,KLAEFIDFL,FLDKNDEEY,KNDEEYMKY,VNDPLLPNY,YLHEIFMKR,ARTSTCPSL,ARQPSPSSA,ARVTSWWPL,QHLDRILAL,GHLEHLHLV,HRLNQEDCL,SSHPKTMPL,SHPKTMPLH,MPLHKINQL,LHKINQLAL,SLEPQIPPL,SLGQVVLQL,GAYPTPRSR,ARPRHTTAL,TRWPTTWAA,GRAPGPPTA,RAPGPPTAL,MTMGAVTTM,TRDLALLAL,LRTSGWTSY,IKLELPLLM,WRMHLMEEL,WVYKDRLIY,TRMNDLHYL,TRPRLWHTA,TRPQDVGLL,SGPPPSPSR,RPRQPSQQL,RRRGSPGSL,VRPPVTTSR,ARAPLPPTX,LPPEAPLLL,LLTGPHWTY,RRSWPTGHH,SRVPNCGAL,CRRPAAHHM,WRRGWNLNL,SRQIRARVL,GRTILLALL,SRDPAKMRR,RRSWQRRRL,RLPSVIPTR,WRFWRLSHV,SPREGKKNG,IPSSPENPL,SPENPLLMV,SPGKYVPPL,AAGRRRPAL,RPALRLLCP,VPGVPGRLC,APGAPAVPA,APAVPAAPA,AVPAAPAAM,APAAMRAAL,GPAAERAGA,AARASSGTA,APGPRPGSP,GPRPGSPRS,RPGSPRSSA,SPRSSAQGV,APRPGAAAS,RIAGCRHSV,GAARRSQRL,LPWAGAPAT,LPHTPPTPA,TPASPSRSP,SPSRSPRAL,SPRALAPAA,RALAPAASA,LAPAASASL,APAASASLC,GPVTPSSTP,TPSSTPSAS,MAISHGTGM,IPLKERWSX,RIHLTGQCL,SISVNISTL,RPNKPVDTC,STRTSERLL,KTSGGATQL,APAPHCASS,GPWAPQMAI,APQMAISGA,RPRERAPPA,APPAAAAAV,APGSRRSSR,RAPPERSSV,APPERSSVP,RSSVPSSPL,VPSSPLSPS,SPLSPSPSA,SPSPSAVSP,SPSAVSPTG,WPGATSPQX,MPRGKPPPH,HPLGDRKPA,SPAATLPSA,LPSAATQTL,PPHLSLRCL,KVEEGKGAM,MPTATPTAH,QPASKARPV,WPHKITINL,LPNHPQALL,QALLALPSL,LALPSLAAL,SPLNNLLSH,LSHGPGIRL,GPGIRLFNL,RPVPPPMER,APLLYLQSH,VPADRDRYV,RPMHLGAVP,VPVYRGSPS,SPQKLAEFI,QPGGITNQF,SPVFEPHIA,VPTPGFGNV,EIFMKRQHL,AARQPSPSS,QPSPSSARV,SPSSARVTS,TPFGVAQVA,EPVHLASQL,ASQLPPQAL,GPVSSHPKT,EPQIPPLIL,IPPLILLAA,LAAHLAPSL,APSLGQVVL,QILWSKLEL,YPATSRRTS,RPRGRRTCW,WPATCASTV,RPAPSMSPS,APSMSPSCA,RPSGAYPTP,YPTPRSRPS,TPRSRPSCA,RPSCARPRH,CARPRHTTA,RPRHTTALA,SPCSHTTRW,WPTTWAATM,APWHRRCCC,GPASATAPT,RAEFAKFAL,SPAWTRDLA,SPPGTRGTQ,GPRRGRTPQ,LPASMSGSC,QPPTPRDKL,TPRDKLSCW,VPPQPRAAH,QPRAAHTCA,RAAHTCAVL,SPKHRSWHT,TPIADDKLF,LPKTRPRLW,RPRLWHTAC,QPYSLLRSC,LPPKLLQQV,PPKLLQQVL,APRARAGLG,APSPWGPTG,SPWGPTGWT,APPRALVPA,VPAPPGPPK,APPGPPKRP,RPPCPKPAI,CPKPAIVSA,KPAIVSASS,STRPQDVGL,RPQDVGLLP,IPCGPARSA,SARRWTAPL,APLRPLSGP,RPLSGPPPS,PPPSPSRRL,RSRLPGPPT,KPRRRRGSP,SPGSLPKSS,GSLPKSSPL,LPKSSPLPS,SPLPSSTGK,RGRSVRPPV,SVRPPVTTS,RPPVTTSRC,HPCSRRAPS,PPLPPEAPL,APLLLTGPH,RGRRTPACM,RPRRSWPTG,RPAAHHMGP,GPWRRGWNL,LPPSPSSLC,VPPPPSSSR,VGRTILLAL,LPFGYVSKL,LPASQRSSA,SSACTRSTL,LPCTHPTGL,GPAQPEIPX,APSQTRQDP,RPATPPRTT,TPPRTTRAT,TPWISDRAW,CPWPSSQPV,QPVWKHLIY,TPLLCLQKL,SPCSPSMRA,SPSMRAGLA,LLMVVFHMI,LAWDPPPGA,LLCPPVPGV,RTASISVNI,TLMSILEAX,STFCGIASL,CLMGKLEEV,KLLKIFQYT,YILSTQDRL,ILSTQDRLV,TLFFCTVNL,NLSSFTIHC,KMHGTPVPI,STLPHSPAA,TLAPSIWIC,AMVALMPLL,LLLSLMPTA,SLMPTATPT,LMPTATPTA,ALLALPSLA,LLALPSLAA,KMQQELEKI,NLLSHGPGI,YMTEKLWRP,SLKHREWGV,YLNGFECFV,RLDAEKAHA,WLQDYWQGL,SLSPGSHQA,KSFGTPFGV,GTPFGVAQV,AQLQHLDRI,HLDRILALA,RILALALLV,HLEHLHLVL,HLHLVLATI,LVLATILEA,TILEASLEI,SQLPPQALA,ALAILEPVL,AILEPVLAV,QIPPLILLA,HLAPSLGQV,VLQLLLLGL,QLLLLGLLL,LLLGLLLNL,LLLNLTLQL,LLNLTLQLV,NLTLQLVQI,ILWSKLELA,ALFLQNQKV,ALLALVCPL,ALVCPLFAA,KIKLELPLL,KLELPLLMM,LLMMALPWV,TTCPPLQPV,HLMEELPAS,LMEELPASM,RLYFVNLRT,FQPEIKGGV,WTWSGRITI,GLSADNIPL,LSDGWIHNV,MLESQISLL,QISLLPPKL,LLQQVLKKI,HLNWKTLNA,LLLTGPHWT,LLLGQGASA,HIPDYTPLC,LIYTPLLCL,#NAME?,NWKRN-LLF,IFS-LFILF,LWPHK-ITI,LFN-LTSTF,DYP-LSLQW,GYAPL-LYL,AFLSR-YKF,RYK-FHLAL,DYMTEK-LW,IFMKR-QHL,GY-SNRLYF,IW-EKITDF,SW-EEQIFW,HYL-NLDTW,IYTPL-LCL,RRRPAL-RL,RRP-ALRLL,MWD-TIKMM,TYI-RSMSY,RRPVPPP-M,RHI-PVDSY,EYM-KYLAY,FWD-YLHEI,SHP-KTMPL,QRAE-FAKF,GYS-NRLYF,VYKDR-LIY,YKD-RLIYF,WHND-VHIF,TRMND-LHY,RRP-AAHHM,WRM-VSMSL,SPENPLL-M,RPA-LRLLC,VPGVPGR-L,VPAAP-AAM,APAAMRAA-,RPGSPRSS-,APR-PGAAA,RP-GAAASL,CPA-GGPCL,TPASPSRS-,SPSRSPRA-,SPR-ALAPA,APA-ASASL,HPD-ALHAY,EPLHPRR-L,APH-CASSL,RPRERAPP-,APGSRRSS-,AP-PERSSV,SPLSPSPS-,SPSPSAVS-,MPRGKPPP-,TPV-PISTL,MPL-LLSLM,LPN-HPQAL,HPQ-ALLAL,YPL-SLQWL,RPVPPPME-,RPM-HLGAV,SPS-VRDWM,MPN-NHSVI,SPQ-KLAEF,SPGD-SPVF,CPV-PTPGF,SPG-SHQAA,SPSSARVT-,TPF-GVAQV,LP-PQALAL,APS-LGQVV,YPATSRRT-,RPR-GRRTC,RPAPSMSP-,TPR-SRPSC,RPR-HTTAL,AP-GPPTAL,GP-PTALFL,WPSCLRR-M,SPAWTRD-L,LPL-LMMAL,GPRRGRTP-,TPRDK-LSC,VPPQPRAA-,QPR-AAHTC,LPK-TRPRL,RPR-LWHTA,APR-ARAGL,APSPWGPT-,SPW-GPTGW,WTAPPR-AL,VPAPPGPP-,RPPCPKPA-,RPQDVGL-L,RPLSGPPP-,PPSPSRR-L,RPRQPSQQ-,LPPEAPL-L,RPRRSWPT-,RPAAHHMG-,LP-PSPSSL,SPS-SLCAV,VPPPPSSS-,RPATPPRT-,QPVWKHL-I,SPS-MRAGL,SLF-ILFTL,TLAPSIW-I,NLSP-VTYL,ALL-ALPSL,KLW-RPMHL,HL-DRILAL,HL-EHLHLV,ALA-ILEPV,IL-EPVLAV,ILWSK-LEL,KL-ELPLLM,HLMEELP-A,RMND-LHYL,ML-ESQISL,RL-PGPPTL,AYVIFKEKL,KYIPLKERW,CYSWVGATL,SWVGATLKL,NYILSTQRL,KWPDSHPAL,SHPDALHYF,AYFGIGLSL,LYSKKTLNW,FFCTVNLSF,LWPKITINL,TYLPNHPQL,YLPNHPQLL,LHGPGIRLF,SYPLSLQWL,LWLPGTAYL,RYVRELMHI,RWMPNNHSV,DWMPNNHSI,KQPGITNQF,PYLNGFECF,MWLQDYWQL,KFWDYLHEF,SHTTWPTTW,RWPTTWATM,RAPPPTALF,AFLQNQKVW,LFAALRTSW,MLPWVWLTF,TYIWEKITF,WYKDRLIYF,SWEEIFWGW,WWHNDVHIF,HFDTKTQTW,IFDTKTQTF,KYIFGGRVL,HYLNDTWTW,TWWSGRITI,HLPKTPRLW,IFQTQPYLL,VFWHLNWTL,PLPPAPLLL,CWPSSQPVW,VWKHLIYTL,ARPGAAASL,TYLPHPQAL,IRFNLTSTF,SRHDYPLSL,YRRPVPPPM,RRPPPPMER,RRRGAPLLY,RRYAPLLYL,GRPGPPTAL,ARTSGWTSY,TYIEKITDF,RRHSELQCF,KRPPPKPAI,RRWTAPLRL,RRLPGPPTL,QRPQPSQQL,SRRPAAHHM,SPREGKNGL,SPNPLLMVV,GAAGRRPAL,VPVPGRLCA,APAPAVPAA,APAPAAPAA,VPAAPAAMA,APAAERAGA,GPAAERAAA,GPRPGSPSS,RPSPRSSAQ,GPRSSAQGV,SPRSSAQVS,SPRPGAAAS,APRPGAASL,RPAAASLRI,RPWAGAPAT,APAEVGCPA,GPLDPAGPL,GPLGPGHEL,TPHTPPTPA,LPHPPTPAS,APSRSPRAL,SPRSPRALA,RPRALAPAA,SPRAAPAAS,APAASASLI,TPSSPSASI,RPNKPVDTY,KPDSHPDAL,SPDALHAYF,RPKKKTSGA,SPAPAPHCA,APAPHASSL,APHASSLGV,SPWAPQMAI,APMAISGAC,SPARPRERA,APRERAPPA,RPREAPPAA,RPPAAAAAV,APAAAAAVA,SPSPSPSAV,SPSPSAVST,SPRGKPPPH,MPRGPPPHP,TPHSPAATL,HPAATLPSA,SPAATPSAA,TPSAATQTL,WISPPHLSL,SPHLSLRCL,TPTAHTVSA,SPASKARPV,QPASARPVL,RPVLLPHKI,LPHKITINL,YPNHPQALL,RPVPPPMEA,IPVDSGKCL,RPMHLGAVV,VPYRGSPSV,WPNNHSVIL,MPNNHSVII,KPGGITNQF,QPGGITNQL,SPFEPHIAQ,EPHIAQPSM,IPNDSWKEM,GARTSTPSL,AARQPPSSA,RPSPSSARV,SPSSARTSW,TPFGAQVAL,LPVHLASQL,LASQPPQAL,LPQALALGL,GPSSHPKTM,HPKTPLHKI,TPLHKINQL,LPQIPPLIL,EPQPPLILL,LPSLGQVVL,YPASRRTSC,RPRGRRTCP,RPAPSSPSC,APMSPSCAS,SPSASRPPC,RPPSSRSSA,YPTPSRPSC,CARPHTTAL,APRHTTALA,RPRHTTLAS,RPTTWAATM,TPTWRASSA,RPGPPTALF,APGPPTALL,WPSLRRMEM,SPAWTRLAL,GPPTPRDKL,TPRDKLSCV,VPPQPRAAT,QPRAAHTAV,SPKHRSHTL,TPADDKLFL,TPRLWHTAC,RPRLWHTAL,QPYSLLRSL,LPKLLQQVL,APRAAGLGP,APSPWGPTW,SPWGPTGTA,APRALVPAP,LPAPPGPPK,APGPPKRPP,KPPCPKPAI,RPPPKPAIV,KPAIVSSST,RPQDVGLPS,RARRWTAPL,APRPLSGPP,RPSGPPPSP,GPPPPSRRL,SPSRLRSRL,RSRPGPPTL,QPRQPSQQL,SPSLPKSSP,SPKSSPLPS,LPKSPLPSS,APSTARAPL,RPRRSPTGH,RPAAHHMPW,MPWRRGWNL,SPSSLAVSX,SSRQIRAVL,NPFGYVSKL,TPASQRSSA,LPASQRSSC,RSSATRSTL,LPSIPTRTA,WPCTHPTGL,APSQTRQPP,RPAPPRTTR,TPRTTRATS,TPLLLQKLL,RPPPPRSCT,LLWDPPPGA,RLLPPVPGV,RLAPGAPAV,ALAPASASL,LLMGKLEEV,SLSPSPSAV,FLFTLFFTV,CLMSPITKV,VLMPLLLSL,SLMPATPTA,VLWPHKITI,LLAPSLAAL,LLSHGPIRL,LQWPGTAYL,KLWPMHLGA,QLAEFIDFL,FLDKNEEYM,RLDEKAHAA,LLVEGLEHL,HLLATILEA,VLATILESL,HLASQLPQA,SQLPPQLAL,QLALAILEV,ALILEPVLA,LILEPVLAV,SLEPIPPLI,LLAAHLPSL,ALAPSLGQV,HLAPSLQVV,GLLNLTLQL,LLLNLTLQV,QLWSKLELA,ILWSKELAL,TLFLQNQKV,KLNKKILEL,KLLPLLMMA,HLMEELPSM,YIFGGVLQT,RMNDLHYLL,KLFLGGLSA,SMLESQISL,LLPPKLLQV,KLLQQVLKI,LLPPSPSSL,NLFGYVSKL,HLYTPLLCL,LLLQKLLAL" 1

elakatos commented 3 years ago

Dear Rosa,

The issue is probably due to some slight difference in the output of netMHCpan 4.0 and 4.1 - it works consistently for the step of NeoPredPipe, but for NeoRecoPo this change means that the output table is not in the format expected by the software. NeoPredPipe was developed to support 4.0, and therefore we haven't tested 4.1. We are planning to extend to this version, but I do not know when that might happen. In the meantime you could use NeoRecoPo without re-running the samples with 4.0, but for that you will need to edit the neoantigens.txt file to delete/re-arrange the columns according to the output of 4.0. You can find the example of what columns are outputted by 4.0 in our readme, and the output format of 4.1 is explained here: http://www.cbs.dtu.dk/services/NetMHCpan/output.php (but keep in mind that we use the -BA flag for including binding affinity predictions).

Also, I've noticed that you're running NeoRecoPo on an "indels" file. Just wanted to emphasise what should be printed if NeoRecoPo succeeds: the recognition potential prediction for Indels is an approximate measure and should not be compared to the recognition potential of SNVs. It is because recognition potential involves the comparison of wild-type and mutated peptide binding, which for indels cannot be evaluated meaningfully. In addition, for Indels I would always advise to use the PeptideMatch functionality as well, since nonframeshift indels or non-novel peptides might be reported otherwise.

Best, Eszter

rschenck commented 3 years ago

@xicola7 agree with Eszter. I've had good success with netMHCpan 4.1 for the first part of the pipeline. NeoRecoPo is indeed where it fails. One discrepancy I would point out is that the binding affinities are present in the outputs when using the command line tool for netMHCpan 4.1, but are missing from the description of the output file online. Not sure why that is, but nonetheless it isn't in the same format as 4.0.

I would suggest doing as Eszter suggests with the re-arranging of the columns to get NeoRecoPo to work. The math itself within the script won't be affected by the changes to netMHCpan. You will run into additional trouble though when trying to call the wild type binding affinities which is handled internally within NeoRecoPo.

Will leave this issue open for now as we do plan on providing support for 4.1 fully.

xicola7 commented 3 years ago

Many thanks for all your input I really appreciate it best, Rosa

xicola7 commented 3 years ago

Hi there, I re-run the analysis with the netMHCpan4.0, used the peptide match to only include novel peptides and frameshift mutations as you recommended. Inside the NeoRecoTMP I only get the empty wildtype.tmp.fasta files. The error I get is below. Any idea? I run the prediction for the missense and it worked well, thus it is something related to the Indels Thanks rosa

(neopred)[rmx2@c14n09 NeoPred]$ NeoRecoPo.py --neopred_in=/home/rmx2/project/NeoPred/Clu2.SBnov.txt --neoreco_out=/home/rmx2/project/NeoPred/ --fastas=/home/rmx2/project/NeoPred/fastaFiles/ INFO: Begin. Traceback (most recent call last): File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/bin/NeoRecoPo.py", line 135, in main() File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/bin/NeoRecoPo.py", line 98, in main preds.ConstructWTFastas() File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/NeoPredPipe/StandardPredsClass.py", line 195, in ConstructWTFastas self.__addToFastaFile() File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/NeoPredPipe/StandardPredsClass.py", line 169, in addToFastaFile seqID, seq = self.extractSeq(sam, fasta_head, epitopeLength) # WT seqID and seq File "/gpfs/ycga/project/xicola/rmx2/conda_envs/neopred/NeoPredPipe/StandardPredsClass.py", line 262, in __extractSeq pos = int(seq_record.id.replace(";;", ";").split(";")[6]) - 1 ValueError: invalid literal for int() with base 10: '64-685'

xicola7 commented 3 years ago

Hi Eszter, can you read my last message, pls. Thanks rosa

elakatos commented 3 years ago

Hi Rosa,

The line from the error message looks like it is an Indel mutation - is that correct? For indels (that should be all separated out into the *.neoantigens.Indels.txt file), you will need to specify the --indel option in running NeoRecoPo, since a step of the RecognitionPotential calculation (when comparing wild-type and mutated peptides) is meaningless for frameshifts and therefore they should be handled separately.

If it is a mutation, that was not put into the separate neoantigens.Indels.txt file, please send me some more information on the mutation and I will investigate while it was not classified as a frameshift when it surely looks like one. For example, it would help if you could send me the corresponding header from the fasta file - you could do this by doing grep '64-685' on the reformat.fasta you have.

xicola7 commented 3 years ago

It worked out at the end, I forgot the --indel option thanks so you much for all your guidance