MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

Which HLA type file I better to use #22

Closed beginner984 closed 3 years ago

beginner984 commented 3 years ago

Hello

I have some .vcf files but I don't have access to raw sequencing data or any patient specific HLA typing

I want to predict neoantigen load for my patients (of proving mutation burden)

In a paper I read: We first collected all peptides defined by a 17 amino-acid region centered on the amino acid which changes upon the mutation. We identified mutant nonamers with ≤500 nM binding affinity for patient-specific class I human lymphocyte antigen (HLA) alleles, constituting potential candidate neoantigens.

If I want to replicate this, which HLA file is better to use? mhcflurry_alleles.txt , netMHCII_alleles.txt , netMHCIIpan_alleles.txt , netMHC_alleles.txt ornetMHCpan_alleles.txt?

I found these here https://github.com/immune-health/antigen.garnish/blob/master/inst/extdata/all_alleles.txt

Please help me to get an intuition

Thank you so much in advance

elakatos commented 3 years ago

Hi,

First of all, there are much better forums to get an answer to this question than an issue on a software page - there are better experts of the field and we would prefer if the issues here are reserved for reporting problems with our software.

I can share my personal recommendation, but please keep in mind that it is my personal opinion alone: I would not recommend using the lists above - they are exhaustive lists of all possible HLA types, some of which are extremely rare, so you will end up with an incredible amount of false positives not relevant to your patient. From a practical point of view, a lot of these alleles are not HLA-A/B/C alleles, so first make sure you are only including MHC-type1 alleles in a type1 analysis.

I have come across publications before where the patient HLA type was not known, and some have employed a set of HLA alleles that are common in the population to provide a good representation. You can look up the frequency of HLA alleles to assemble the most common ones yourself (for example if you know the patient's ethnicity). Or there are for example reference sets available that should cover >97% of the population: https://help.iedb.org/hc/en-us/articles/114094151851-HLA-allele-frequencies-and-reference-sets-with-maximal-population-coverage

In terms of using our software, you can provide in the hlatypes.txt file more than 6 alleles (up to 20, if aiming to use a larger reference set), as long as they are listed in the same tab-separated format as shown on the Readme page.

Eszter

beginner984 commented 3 years ago

Thank you so much

I have used class I file from your github

This file

netMHCpanAlleles_classI.txt

I hope this file is appropriate because our IT group already run your software on my vcf files using this file