MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

multi or single region #23

Closed beginner984 closed 3 years ago

beginner984 commented 3 years ago

Hello

I had run your software on my data wrongly put c 1 2 parameter instead of c 1 while my vcf files are single region so I have a results like

Untitled

So if I sum up the numbers in the highlighted column, that is the total neoantigen burden for this given sample, am I right?

Does the mistake for c parameter makes any changes to the results and I should re-run the analysis?

Thank you for any help

elakatos commented 3 years ago

Each row is a separate vcf the analysis was run on, as you can see from the sample-name. If you read the Readme, you can see that the total neoantigen count per sample is reported under Total (regardless of how many regions are in the sample). The mistake should not be a problem thanks to that. Total_SB reports the total number of strong binder neoantigens. I cannot advise you about how the different vcfs relate to your sample, and neither whether you should sum them up. If they represent different HLA alleles (not derived from the patient), again, you should search for what has been done in the literature before.

beginner984 commented 3 years ago

Thank you this spreadsheet is only for one patient not more. My friend has advised me to put different number for the same sample :(

Please advise me if this makes my results unreliable

elakatos commented 3 years ago

In that case you could sum the numbers, but keep in mind that a patient has only 6 HLA alleles, so if you evaluated for more, your numbers might be overestimated. You've asked before how to handle the case when you don't know the patient-specific HLA types and as I told before I am not an expert on how to handle that problem.

This page is for issues with our software. I can tell you that it seems like the software run well for your sample, so there was no problem in terms of that. Happy to answer if you have question on what information is in the output table, thought first please try to seek an answer on the Readme page. But as for how to best interpret the results in your case, it is not my place to advise you.

beginner984 commented 3 years ago

Thank you so much for answering me

To be honest, a bioinformatiacian has run your software on my vcf just for friendship as I failed to install the software on my laptop.

From the results, likely she has not defined if the peptides from the matched normal to be calculated (-m parameter) In a paper I read,

We then quantified the peptides that displayed high affinity binding in tumour, but low or no binding in the respective matched normal and obtained total counts for each defined patients subgroups.

Here, likely I only can count for tumour regardless if the peptide be common between tumour and normal. If my goal is having total neoantigen burden between two groups of patients, does this affect the accuracy of my results?

elakatos commented 3 years ago

I would like to emphasise once again, that it is NOT my expertise (or job) to advise you on how to interpret your data. I think you should ask for advice from people who know your patient samples better or have done similar analysis, because I don't want to mislead your research by giving false advice. I have not done neoantigen calls myself when I did not know the patient-specific HLAs.

I don't know which paper you quoted from. In case they meant comparing the mutated (tumour-specific) peptide and the wild-type (as found in normal) peptide: in the NeoPredPipe step of our pipeline, only the mutated (tumour-specific) peptide is evaluated. The -m option confirms whether the mutated peptide is novel - it does NOT compute the wild-type peptide affinity. On the other hand, in the NeoRecoPo step of the pipeline (has to be performed separately) the wild-type affinities are also predicted (following Luksza et al., 2017.), so if you wish, you can get this information from the tables produced during this step. Neoantigens are evaluated variably across the literature, so it is a valid analysis to only consider mutated peptide binding affinity (as you got it from the table above), if you don't want to take wild-type peptide affinities into account. It is definitely most important to be consistent between your patient groups and perform the exact same analysis for all of your patients.

elakatos commented 3 years ago

As I mentioned, I don't think I should advise on matters of science (rather than the software) here, so I closed this issue now. Good luck with your analysis!

beginner984 commented 3 years ago

Thank you so much

Sorry, if I run the NeoRecoPo of my data, what would be an example of the output? I am kindly asking to see if I should expect what before going through that

Thank you

elakatos commented 3 years ago

The final output of the analysis should be a file named PredictedRecognitionPotentials.txt, a table file that contains predicted recognition potential for the neoantigens. If you want the wild-type peptide information, I suggest to run the analysis with the -d option (to keep temporary directories), in which case you will have also have a folder named NeoRecoTMP. Amongst the temporary files in that directory, the file Neoantigens.WTandMTtable.txt will have wild-type and mutated peptide binding affinity scores (WT.SCORE and MT.SCORE columns).

beginner984 commented 3 years ago

Thank you so much I have attached the output of a software here which is my desire output in terms of recognition

But I have totally failed to run this software on my data so I have run your software on my data and now I want to run the second step

Please you may have a look at this file to see if I can get something similar here?

neoantigenPresentation.txt

elakatos commented 3 years ago

Most of these fields will be in the Neoantigens.WTandMTtable.txt I mentioned above, yes.

beginner984 commented 3 years ago

Thank you so much. I had run your software with 9 mer Sorry how much you agree with this from my supervisor? If you want to evaluate the overall neoantigen burden, then you should probably run with 17 mer length peptides too

elakatos commented 3 years ago

Typically type1 MHC alleles bind peptides of length 8-11, with most alleles binding 9 and 10mers, that's why we advise these lengths in our documentation. Of course you can specify any other length. Just to be clear, these 9mer peptides are tested by taking 8 amino acids on BOTH sides of a mutated amino acid, so technically we ARE creating 17mers: 8aa+mutated+8aa - but we check the binding strength for peptides of length 9. Maybe that's what your supervisor meant, but if now, you can run the software with longer peptides too.

beginner984 commented 3 years ago

Hello and thank you so much in advance

Finally I accompished with recognition potential part of the software.

In the publicarion if I follow you correctly, if I filter the PredictedRecognitionPotentials output file for each sample for exclude (0,1) column and recognition potential higher than zero, I likely would get high fidelity data per sample

But my question please is:

Actually my ultimate goal is to compare neoantigen burden in two groups of patients (responders and non-responders to therapy)

In this case, how I achieve neoantigen burden per sample? I mean which column represents this? If it makes sense to just count the number of peptides with recognition potential higher than zero absent in the WT per sample and compre two groups by this (an array of total number of filtered peptides per sample)?

Thank you so much for any thought