MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

The meaning of some columns #28

Closed wt12318 closed 3 years ago

wt12318 commented 3 years ago

Hi,

I use the following code to predict neoantigen :

NeoPredPipe.py -I ./test/ -H ./TCGA_HLA_typing.txt -o ./test_results -n TCGA-02-0047-01A -c 0 -E 8 9 10 -x exp/

The test_result directory contains the following files:

 ls

TCGA-02-0047-01A.neoantigens.Indels.summarytable.txt  TCGA-02-0047-01A.neoantigens.summarytable.txt
TCGA-02-0047-01A.neoantigens.Indels.txt               TCGA-02-0047-01A.neoantigens.txt

But this TCGA-02-0047-01A.neoantigens.txt file has some columns between Identity and Binding Affinity that I can not find in the Readme:

image

What these columns stand for?

Thank you.

wt12318 commented 3 years ago

TCGA-02-0047-01A.neoantigens.txt

This is my output file.

elakatos commented 3 years ago

Hi, Which version of netMHCpan have you used? The pipeline was primarily developed for netMHCpan4.0 and readme is for outputs gained from that. I believe netMHCpan4.1 includes additional columns, so my guess would be the extra columns come from this version? In which case according to the netMHCpan4.1 documentation, columns V21-V25 should be these: Score_EL, %Rank_EL, Score_BA, %Rank_BA, Aff(nM).

I'll update the readme to clarify this difference. Also, if you want to use the NeoRecoPo side of our pipeline too, please keep in mind that you should format this output table to have the same columns (in the same order) as the netMHCpan4.0 output (shown in the readme), otherwise the downstream processing will not work.

Best, Eszter

wt12318 commented 3 years ago

Thank you for your prompt reply. Yes, I used netMHCpan4.1.