Wildtype output field - Githubissues

Apologies on the slow response, writing commitments got in the way...

Unfortunately there's no option for this in the basic NeoPredPipe pipeline - we only ever evaluate the mutated peptide, so do not record the wild-type one separately. However, you can use intermediate files of NeoPredPipe or NeoRecoPo depending on what exactly do you need from the wild type peptide.

1) If you run the second step of the analysis, NeoRecoPo, in that step we do evaluate the wild-type counterpart and its binding ability. This pipeline produces an intermediate file called Neoantigens.WTandMTtable.txt that contains wild-type and mutated peptide pairs (the amino acid sequence), together with their respective binding affinities. There are also samplename.wildtype.tmp.length.fasta files produced with just the sequence of the WT peptides. One thing to note is that in this step we only consider epitopes that were deemed antigenic with a binding affinity <=500 in the first neoantigen prediction step by NeoPredPipe. So a few mutated peptides might be filtered out before you'd get the WT information.

2) Alternatively, you can process the intermediate files of NeoPredPipe to retrieve the wild-type sequence: in fastaFiles, the files samplename.fasta and samplename.reformat.fasta contain one entry each for the wild-type and the mutated peptide sequence of the whole gene product, and the header of the mutated entry contains the location of the mutation. Like this: _>line112 NM001301060 c.G1006T p.G336C protein-altering (position 336 changed from G to C) MAAAGEGTPSSRGPRRDPPRRPPRNGYGVYVYPNSFFRYEGEWKAGRKHGHGKLLFKDGSYYEGAFVDGEITGEGRRHWAWSGDTFSGQFVLGEPQGYGVMEYKAGGCYEGEVSHGMREGHGFLVDRDGQVYQGSFHDNKRHGPGQMLFQNGDKYDGDWVRDRRQGHGVLRCADGSTYKGQWHSDVFSGLGSMAHCSGVTYYGLWINGHPAEQATRIVILGPEVMEVAQGSPFSVNVQLLQDHGEIAKSESGRVLQISAGVRYVQLSAYSEVNFFKVDRDNQETLIQTPFGFECIPYPVSSPAAGVPGPRAAKGGAEADVPLPRGDLELHLGALHCQEDTPGGLLGSSLF By parsing this, you could extract the mutated position (336) and use the wild-type sequence reported just in the previous entry, to get the peptide sequence with -/+ N flanking amino acids on each side. (This is something we implement to get the mutated sequence in ExtractSeq in vcf_manipulate.py, to give a starting point for the code.) This method is way more involved, but no epitopes are filtered out.

Best, Eszter

MathOnco / NeoPredPipe

Wildtype output field #31