MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
105 stars 28 forks source link

The problems I met in the result of MHC-II #13

Closed Tiredbird closed 5 years ago

Tiredbird commented 5 years ago

Hi,all

I have tested about 30 samples for MHC-II type. They all worked well, but the result seemed quite different with that of MHC-I. The MHC_II neoantigens.Indels.txt was as following:

xq009a 1 line226 chr9 43133358 G - ANKRD20A3:NM_001012419,ANKRD20A2:NM_001012421 55 DRB1_1501 STELLYTWPVPVAMC line226;NM_0010 3 LLYTWPVPV 0.71 0.608 69.27 3 NA <=WB 1 xq009a 1 line226 chr9 43133358 G - ANKRD20A3:NM_001012419,ANKRD20A2:NM_001012421 56 DRB1_1501 TELLYTWPVPVAMCK line226;NM_0010 2 LLYTWPVPV 0.695 0.596 78.72 4 NA <=WB 1 xq009a 1 line226 chr9 43133358 G - ANKRD20A3:NM_001012419,ANKRD20A2:NM_001012421 57 DRB1_1501 ELLYTWPVPVAMCKW line226;NM_0010 1 LLYTWPVPV 0.665 0.567 107.86 6.5 NA <=WB 1 xq009a 1 line243 chrX 54011405 CTC - PHF8:NM_001184896,PHF8:NM_001184898,PHF8:NM_001184897,PHF8:NM_015107 230 DRB1_1501 GRILKIHRNGKLLL* line243;NM_0011 3 LKIHRNGKL 0.715 0.681 31.67 0.4 NA <=SB 1

① xq009a 1 line243 chrX 54011405 CTC - PHF8:NM_001184896,PHF8:NM_001184898,PHF8:NM_001184897,PHF8:NM_015107 230 DRB1_1501 GRILKIHRNGKLLL* line243;NM_0011 3 LKIHRNGKL 0.715 0.681 31.67 0.4 NA <=SB 1

What is the meaning of 0.715 0.681 31.67 0.4? What is the meaning of 230 and 3?

② Why did “NA” appeared? ③ Did the identity (line243;NM_0011) mean PHF8:NM_001184896 ? Did we use the first transcript as the identity?

Thanks

elakatos commented 5 years ago

Hi, Since Type-II prediction works with netMHCIIpan, the differing results come from the different output of the different prediction software. I believe you used netMHCIIpan 3.2, for which the description of the output can be found here: http://www.cbs.dtu.dk/services/NetMHCIIpan/output.php

Therefore the meaning of the four numbers: Core_Rel, 1-log50k(aff), Affinity(nM), %Rank. As with netMHCpan, %Rank is used for evaluating SB/WBs and advised to be used for downstream analysis.

The NA stands for the field of "experimental binding" - this could be supplied in your input file if you knew it and used for benchmarking purposes and returns NA when not specified, so you should ignore it in this case.

The field Identity, is automatically generated by the binding prediction software, by truncating the header line of the input fasta (same is true for type-I prediction). I suggest you ignore this field for downstream analysis, and use the Gene field provided (8th column in your example), or the LineID.