X-lab-3D / PANDORA

MODELLER-based, anchor restrained, Peptide-MHC Modelling pipeline
Apache License 2.0
17 stars 5 forks source link

User-defined MHC sequence #253

Closed bsb2014 closed 1 year ago

bsb2014 commented 1 year ago

I put the codes below in the PANDORA.py

from PANDORA import Target from PANDORA import Pandora from PANDORA import Database db = Database.load() target = Target(id = 'ara1', N_chain_seq= 'xxxxxxxxxxx', peptide = 'xxxxxxxxxxxx', use_netmhcpan = True, MHC_class='II') case = Pandora.Pandora(target, db) case.model()

and then python ./PANDORA.py but got an error

Traceback (most recent call last): File "/mnt/chsrhome/xxxx/./PANDORA.py", line 2, in from PANDORA import Target File "/mnt/chsrhome/xxxx/PANDORA.py", line 2, in from PANDORA import Target ImportError: cannot import name 'Target' from partially initialized module 'PANDORA' (most likely due to a circular import) (/mnt/chsrhome/xxxx/PANDORA.py)

Just tested pandora-run -m I -i myTestCase -a HLA-A*0201 -p LLFGYPVYV -k 2,9. It worked well. However, the command line seems not to have the argument for "N_chain_seq".

DarioMarzella commented 1 year ago

Hi @bsb2014, that is not exacty how PANDORA is supposed to be used (nor most part of other python packages installed from source).

You are not supposed to change the package itself in any way, nor you need your scripts to be in the same folder. Once you have run the installation (in your case, according to issue #250 , pip install e .), PANDORA will be available as a python package to any python (py3) scripts that runs in that same environment (in your case, I assume the conda environment set up by mamba).

So, you can create your python script (e.g. MyScript.py), and put your code in there:

from PANDORA import Target
from PANDORA import Pandora
from PANDORA import Database
db = Database.load()
target = Target(id = 'ara1',
N_chain_seq= 'xxxxxxxxxxx',
peptide = 'xxxxxxxxxxxx',
use_netmhcpan = True,
MHC_class='II')
case = Pandora.Pandora(target, db)
case.model()

This will import PANDORA, even if it's not in the same folder or script. Please remember to change back all the edits you made to PANDORA.py. That, and all the other files from the package, should stay unchanged.

On the other hand, the command-line implementation (" pandora-run -m I -i myTestCase -a HLA-A*0201 -p LLFGYPVYV -k 2,9 " ) it's designed to quickly model straightforward cases, therefore is much more flexible and open to personalization than the python interface.

Please let me know if you have any further questions!

bsb2014 commented 1 year ago

This time, I put the codes in PANDORA_MHCii.py and got a different error as follows

WARNING: Missing M chain (Alpha chain) sequence and allele name. Traceback (most recent call last): File "/mnt/chsrhome/xxxx/./PANDORA_MHCii.py", line 10, in target = Target(id = 'ara1', File "/mnt/chsrhome/xxxx/src/PANDORA/PANDORA/PMHC/PMHC.py", line 300, in init self.fill_allele_seq_info(use_templ_seq=use_templ_seq) File "/mnt/chsrhome/xxxx/src/PANDORA/PANDORA/PMHC/PMHC.py", line 538, in fill_allele_seq_info raise Exception('No MHC chain available') Exception: No MHC chain available

I would appreciate it if you could give me some suggestions.

DarioMarzella commented 1 year ago

I see. To model MHC-II cases, PANDORA needs informations about both the MHC chains, or at leats it needs to know that your beta chain is HLA-DRB1, so it can automatically use HLA-DRA*01:01 as default alpha chain. So you can either specify both the chains allele names (the beta chain allele name won't be used to replace your user-defined N_chain_seq, no need to worry about that), or specify only the beta chain name and let pandora automatically assign the alpha chain, or give a sequence for the both of them:

Specify both allele names:

from PANDORA import Target
from PANDORA import Pandora
from PANDORA import Database
db = Database.load()
target = Target(id = 'ara1',
allele_type=['HLA-DRB1*01:01', 'HLA-DRA*01:01']
N_chain_seq= 'xxxxxxxxxxx',
peptide = 'xxxxxxxxxxxx',
use_netmhcpan = True,
MHC_class='II')
case = Pandora.Pandora(target, db)
case.model()

Or specify the Alpha chain sequence as M_chain_seq (I used the HLA-DRA*01:01 that you can also find in your database at ~/PANDORA_databases/default/mhcseqs/HLA_cleaned.fasta)

from PANDORA import Target
from PANDORA import Pandora
from PANDORA import Database
db = Database.load()
target = Target(id = 'ara1',
M_chain_seq='MAISGVPVLGFFIIAVLMSAQESWAIKEEHVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITNVPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCRVEHWGLDEPLLKHWEFDAPSPLPETTENVVCALGLTVGLVGIIIGTIFIIKGVRKSNAAERRGPL'
N_chain_seq= 'xxxxxxxxxxx',
peptide = 'xxxxxxxxxxxx',
use_netmhcpan = True,
MHC_class='II')
case = Pandora.Pandora(target, db)
case.model()

Let me know if it you have other doubts!

bsb2014 commented 1 year ago

Many thanks. I tried the second method (specifying the Alpha chain sequence as M_chain_seq) and it worked well. The log is below. Could you please have a look if everything worked as expected?

No MHC alpha chain allele was provided. Trying to retrieve it from reference sequences... No MHC alpha chain allele was provided. Trying to retrieve it from reference sequences... WARNING: no anchor positions provided. Pandora will predict them using netMHCIIpan. Predicted the binding core using netMHCIIpan (4.0):

    offset: 3
    core:   xxxxxxxxx
    prob:   0.98

    Predicted peptide anchor residues (assuming canonical spacing): [4, 7, 9, 12]

############################################### This is a Target structure. ID: ara1 Type: MHC class II Alleles: ['HLA-DRA01:01:03', 'HLA-DRA01:01:02', 'HLA-DRA01:01:01', 'HLA-DRB113:22:01', 'HLA-DRB113:11:01', 'HLA-DRB111:18'] Alpha chain length: 254 Beta chain length: 266 Peptide length: 15 Alpha chain: xxxx Beta chain: xxxx Peptide: xxxxxxxxxxxx Anchors: [4, 7, 9, 12]

Modelling ara1...

    Target ID: ara1
    Target MHC Class: II
    Target Allele:  ['HLA-DRA*01:01:03', 'HLA-DRA*01:01:02', 'HLA-DRA*01:01:01', 'HLA-DRB1*13:22:01', 'HLA-DRB1*13:11:01', 'HLA-DRB1*11:18']
    Target Peptide: xxxxxxxxxxxxxxxx
    Target Anchors: 4,7,9,12

    Looking for a template...
    Selected template structure (1): ['6CQL']
    Templates Allele:  [['HLA-DRA*01:01', 'HLA-DRA*01:02', 'HLA-DRB1*11:01']]
    Templates Peptide: ['RFYKTLRAEQASQ']
    Templates Anchors: [[3, 6, 8, 11]]

############### TEMPLATE: 6CQL Chain M, excess N 28, C 48 Chain N, excess N 29, C 47 Successfully created alignment file Successfully created the initital model Calculating peptide anchor residue constraints... Performing homology modelling of ara1 on 6CQL...

    Modelling was successfull and took 205.24 seconds

    Model                           Molpdf
    ara1.BL00190001         118.6845
    ara1.BL00020001         124.151
    ara1.BL00170001         126.5486
    ara1.BL00050001         127.9532
    ara1.BL00100001         129.1185
    ara1.BL00160001         130.3531
    ara1.BL00180001         133.9064
    ara1.BL00030001         135.1486
    ara1.BL00130001         144.2921
    ara1.BL00090001         145.3939
    ara1.BL00140001         147.2704
    ara1.BL00080001         230.9551
    ara1.BL00110001         264.3848
    ara1.BL00120001         339.1368
    ara1.BL00070001         401.7965
    ara1.BL00150001         402.2689
    ara1.BL00040001         463.6136
    ara1.BL00200001         497.7397
    ara1.BL00010001         504.561
    ara1.BL00060001         1424.0814
    Successfully modelled 20 models
DarioMarzella commented 1 year ago

Great, all looks good! Remember now that ara1.BL00190001.pdb is the best model, but you can also take a look at the other top-5 models, as their molpdf score seems to be quite similar. Good luck with your project!

bsb2014 commented 1 year ago

Many thanks for your help with setting up the Pandora!!

bsb2014 commented 1 year ago

Is it normal if Molpdf score is negative? thanks

DarioMarzella commented 1 year ago

Hi, yes. Molpdf is the MODELLER objective function, it's a relative score and it can easily be negative. The lower it is, the better the model, but it can be used only to compare between models of the same complex (i.e. you cannot use it to compare between models of different peptide-MHC complexes)