X-lab-3D / PANDORA

MODELLER-based, anchor restrained, Peptide-MHC Modelling pipeline
Apache License 2.0
15 stars 5 forks source link

wrapper script #235

Closed GuidoLeoni closed 6 months ago

GuidoLeoni commented 1 year ago

Dear Developers Your tool is very great. Today I have a new question. I'm using the wrapper tool to model multiple peptides at the same time. The peptides are reported in datafile.tsv and for each peptide I would specify the pdb template. Everything works well except that the models don't contain the HLA protein. (The same script without specifying the template works well).

Some warnings are printed during the execution. For example related to the first peptide, the tool prints: WARNING Invalid character '5' in FASTA sequence data, ignored DO you have any tips to suggest to me? thanks in advance Guido

Here is a an example of the datafile.tsv Peptide MHC ID template NGKLRHLSS HLA-B08:01 CCDC89_P374SMT9 5WMQ MQIDNRLPPK HLA-A03:01 SLC17A1_M3IMT3 6O9B LAKYEARHRPL HLA-B08:01 HLF_G293RMT9 4QRP FLIDYMFFEK HLA-A03:01 PLSCR2_R222KMT10 2XPG

and this is the code that I used "from PANDORA.Pandora import Pandora from PANDORA.Database import Database from PANDORA.Wrapper import Wrapper

A. Load pregenerated database of all pMHC PDBs as templates

db = Database.load('PANDORA/pandora_Database.pkl')

B. Create the wrapper object

wrap = Wrapper.Wrapper()

C. Create all Target Objects based on peptides in the .tsv file

wrap.create_targets('datafile.tsv', db, 'I', IDs_col= 2,M_chain_col=3)

C. Perform modelling

wrap.run_pandora(num_cores=10)"

DarioMarzella commented 1 year ago

Hi @GuidoLeoni , thanks for contacting us. First of all, I would need to know which version of PANDORA you are using. You can retrieve it by running:

`import PANDORA

print(PANDORA.version)`

Then, I am actually quite surprised by the fact that PANDORA can run without crashing in your case... The reason you are having this issue is that the Wrapper module does not support (yet) passing the template for each target. It should be quite trivial to add on our side though. I will keep you posted about this.

The argument you are passing, "M_chain_seq" is for the target MHC alpha chain sequence (which is there in case you want to be sure about the sequence used and you don't simply want to pass the MHC allele name), so a completely different purpose. I am sorry if our documentation is not clear about it, we are working to improve it. To know exactly what every argument does, you can use the python help function, here is an example:

from PANDORA.Wrapper import Wrapper wrap = Wrapper.Wrapper() print(help(wrap.create_targets))

To fix your issue at this moment you would need to run each case singularly (without the wrapper module, but using Pandora.model for every case as I suggested in issue #229 ) or you will have to wait until we publish a new release including the feature that you need (we are trying to do it within next week).

Finally, allow me to give you a suggestion: I would still advise you to also run the models with the template selected autonomously by PANDORA and check those as well. Looking back at the example you mentioned in issue #229 , the different residue in position 1 might actually be different enough to justify a change in the conformation. in fact, with canonical anchoring, P1 is often very restrained, meaning a Cysteine might fit there but an Arginine might well not be able to, having a side chain that is almost twice the size.

GuidoLeoni commented 1 year ago

Dear @DarioMarzella Thank you very much for your fast aswer. I'm running the 1.0 version of Pandora. I will try also to apply your suggestion. At present I'm testing PANDORA on a dataset of validated pairs of mutated/wt neoantigens and of course I would estimate if there is an evident signal (or not) related to changes of conformations in mut vs the wt counterparts

DarioMarzella commented 1 year ago

Thanks for the information. This is not related to this issue, but may I ask which anchors are you using? From the script you posted above, it would seem like you are letting PANDORA use the default anchoring, which might be incorrect especially if you have mutations at or close to the anchor position. I would advise you to install NetMHCpan to predict the binding cores. Its piping with PANDORA in version 1.0 is a bit clunky, but we made several improvements to it in version 2.0 pre-releases. Still, I would suggest you wait until we drop the next update to update your installation of PANDORA (I can guide you through it, if needed).

GuidoLeoni commented 1 year ago

OK I'll wait thank you

GuidoLeoni commented 1 year ago

Dear @DarioMarzella I would test the new version of PANDORA . please could provide to me some indications about how to update it? thanks Guido

DarioMarzella commented 1 year ago

Dear @GuidoLeoni , you were faster than me in replying here! First, thanks a lot for "volunteering" to test. To update PANDORA I would advice you to manually uninstall the previous version and install the current one. This should not be necessary for future releases but it is better to do do in this case, considering the amount of changes in the new version.

Then, in any case: Just to be sure, set again your MODELLER license key: run export KEY_MODELLER="XXXX" replacing "XXXX" with your key. And finally reinstall PANDORA: conda install -c csb-nijmegen csb-pandora -c salilab -c bioconda

You will need to download the template database but now you will be able to do so by simply running from command line: pandora-fetch (all of those steps are listed in the updated installation section of the README file)

Please let me know if any step is not clear or it does not work, I will be happy to help.