MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

Is PeptideMatch necessary? #12

Closed Tiredbird closed 5 years ago

Tiredbird commented 5 years ago

Hi PeptideMatch is only necessary if one wishes to check predicted epitopes for novelty against a reference proteome. Meanwhile, you advise the use of PeptideMatch for indel predictions, to filter out non-frameshift peptides and peptides that are novel to the genomic location, but coincidentally exist elsewhere.

So, if I do not care about the novelty of the predicted epitopes, should I use PeptideMatch to filter out non-frameshift peptides? Thanks

elakatos commented 5 years ago

Hi, PeptideMatch is indeed optional, but depending on the sample (and of course the research question you ask) I found that it can make quite a difference. When processing potential frameshift (indel) mutations, we use the approach to process the whole downstream protein, and predict antigenicity for any length N peptides from it. (So in case of e.g. a stop-loss mutation, it can easily mean hundreds of peptides processed for one mutation.) There are sometimes short indels that do not cause a frameshift (e.g. an insert of 3 bases), and hence only modify a single amino acid, and therefore further downstream peptides will be identical to the original protein. Currently, indel-type mutations are not filtered for whether they are predicted to cause a frameshift or not, to ensure no peptides are ignored in the analysis - so the above non-frameshift-indels will produce many "false positives", that are actually not novel. Besides, due to repetitiveness in the genome, elongating frameshift mutations (that produce a much longer mutated protein by translating what originally was an intergene region) can also produce peptides that are parts of other "healthy" proteins.

So in summary, due to the way indels are processed and their genetic properties, there can be many non-novel predicted antigens with indel-type mutations in my experience (~50% of results), and that's why we recommend the use of PeptideMatch. If you do not care about this possible bias, you can ignore PeptideMatch, or alternatively you could use other postprocessing steps, as all predicted antigens will be present in the final output table. But keep in mind in this case that the predicted indel-type neoantigen numbers in the summarytable might be hugely over-estimated without PeptideMatch.