Open yang-arina opened 1 year ago
For monomers, related functions and command-lines do not involve any PISA queries and thus can retrieve results successfully. For multimers, however, the PISA API call is a crucial step since it is the source of inter-chain relationships and it would take me some time to write alternative code for it.
I sent a mail to the PDBe team and received their replies on Mar 8. They said that they are actively looking to improve this service and will have something soon that should be better. But it seems that it still requires weeks to months for them to handle this.
I will notify you if I finish an alternative way that substitutes for PISA API to get the inter-chain interaction info.
Please let me know if the multimer-related results are important and necessary to you.
Thank you for your reply, as for our research, if it is currently not possible to obtain the interface information for PISA, it is crucial to skip this step and obtain the complex structure directly.
SIFTS API does not provide any chain interaction information. PISA API provides both the interacting chains and their corresponding interacting residues (i.e. interface residues).
I am looking for alternatives to PISA API for chain interaction information.
I managed to make pipe_select_ho
and pipe_select_he
work again, without calling PISA API. You can try the new version: pip install pdb-profiling==0.4.1
.
Type | Args |
---|---|
Monomer | sifts-mapping --func pipe_select_mo (default) |
Homodimer | sifts-mapping --func pipe_select_ho |
Heterodimer | sifts-mapping --func pipe_select_he |
sifts-mapping --func pipe_select_else --kwargs '{"func": "pipe_protein_ligand_interface", "focus_assembly_ids": (0,)}' |
|
sifts-mapping --func pipe_select_else --kwargs '{"func": "pipe_protein_nucleotide_interface"}' |
|
Protein-Nucleotide Interaction Pair (New) | sifts-mapping --func pipe_select_else --kwargs 'func="Protein/NA"' |
PDB-Profiling 0.4.1
make use of the RCSB Data API and RCSB Search API to retrieve the chain interaction information, as well as the interface residues. They only provide the protein-protein and protein-nucleotide acid interactions of biological units (no asymmetric unit). The interface residues are calculated by biojava-structure-6.0.5
.
See https://data.rcsb.org/redoc/index.html#tag/Interface-Service/operation/getInterfaceById
e.g. https://data.rcsb.org/rest/v1/core/interface/{entry_id}/{assembly_id}/{interface_id}
Thank you very much for your help! But there are still some issues regarding how to get Protein-Ligand Interaction Pair data in the new way.
pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_ho --output test1_result_ho.txt test1.txt contains the uniprot id error like:
[16:57:27] Initializing Folder: /data/user/new_analysis/dataset_for_analysis/try_pdb_p command.py:43
Total 10 to query command.py:272
70.0% βββββββββββββββββββββββββββββΊβββββββββββ 7 of 10 [ -:--:-- 227.61s ]
Traceback (most recent call last):
File "/data/user/miniconda3/envs/pp_4/bin/pdb_profiling", line 8, in <module>
sys.exit(Interface())
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
rv.append(sub_ctx.command.invoke(sub_ctx))
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/commands/command.py", line 275, in sifts_mapping
res = SIFTSs(ids[i:i+chunksize]).fetch(func, **kwargs).run(p.track).result()
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/unsync/unsync.py", line 117, in result
return self.concurrent_future.result(*args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 4176, in run
return [await fob for fob in tqdm(as_completed(self.tasks), total=len(self.tasks))]
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 4176, in <listcomp>
return [await fob for fob in tqdm(as_completed(self.tasks), total=len(self.tasks))]
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/asyncio/tasks.py", line 619, in _wait_for_one
return f.result() # May raise f.exception().
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3633, in pipe_select_ho
p_df = await self.pipe_select_ho_base(**kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3583, in pipe_select_ho_base
p_df = await self.retrieve_rcsb_interface('ho', sele_df, chain_pairs=chain_pairs, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3928, in retrieve_rcsb_interface
interfaces_dfs = [await i for i in ob.tasks]
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3928, in <listcomp>
interfaces_dfs = [await i for i in ob.tasks]
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 1521, in pipe_interface_res_info_for_rcsb
interface_df = await assembly.get_interface_info_from_rcsb_data_api()
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 1906, in get_interface_info_from_rcsb_data_api
profile_df = (await PDB(self.pdb_id).profile_id()).query(f'assembly_id == {self.assembly_id}')
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 1320, in profile_id
assert profile_lyst == ass_lyst, f"\n{self.pdb_id}\n{assembly_id}\n{entity_id}\n{profile_lyst},\n{ass_lyst}"
AssertionError:
2v5w
1
3
('D',),
('F', 'G')
[17:19:13] Initializing Folder: /data/user/new_analysis/dataset_for_analysis/try_pdb_p command.py:43
Traceback (most recent call last):
File "/data/user/miniconda3/envs/pp_4/bin/pdb_profiling", line 8, in <module>
sys.exit(Interface())
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
rv.append(sub_ctx.command.invoke(sub_ctx))
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/commands/command.py", line 207, in sifts_mapping
kwargs = dict(sub.split('=') for item in kwargs for sub in item.split(';'))
ValueError: dictionary update sequence element #0 has length 1; 2 is required
1 (and 2). try pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_ho --output test1_result_ho.txt test1.txt contains the uniprot id --skip_pdbs '2v5w'
(There is something wrong with the API data related to 2v5w.)
pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_else --kwargs 'func="Protein/NA"' --output test1_result_na.txt
--skip_pdbs '2v5w'
, this operation does work, but the problematic pdb structures just like a bottomless hole, with one solved and another followed. Is there any way to solve this problem from the root?pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_else --kwargs 'func="Protein/NA"' --output test1_result_na.txt
, the same problem occurred as above.Try pip install pdb-profiling==0.4.2
. And you do not have to add --skip_pdbs '2v5w'
.
Try pip install pdb-profiling==0.4.2
. And you do not have to add --skip_pdbs '2v5w'.
Resend this message cause I first wrongly typed
0.4.1
in my last response and edited it to0.4.2
on GitHub Issue. Email can not track that change.
Thank u for reply! I have tried pip install pdb-profiling==0.4.2
, but there still remains some questionsοΌ
--skip_pdbs
because some wrong pdb ids like '7xw5','7xw6','7o9t','7o9x','7o9z','7oa9','8ig0'
(and so on) still exist. After that, the mapping command lines for mo, ho, na have run successfully.pdb_profiling sifts-mapping --input canonical_id_0328.txt --column unp_canonical_id --func pipe_select_he --output canonical_result_he.txt --skip_pdbs '7xw5','7xw6','7o9t','7o9x','7o9z','7oa9','8ig0','7uzp','7wjq','7tnh','7t1u','7vvl','7sck','7y9z','8dm5'
there are some new problems:
(pp_5) [user@bogon try_pdb_p]$ pdb_profiling sifts-mapping --input canonical_id_0328.txt --column unp_canonical_id --func pipe_select_he --output canonical_result_he.txt --skip_pdbs '7xw5','7xw6','7o9t','7o9x','7o9z','7oa9','8ig0','7uzp','7wjq','7tnh','7t1u','7vvl','7sck','7y9z','8dm5'
[11:02:36] Initializing Folder: /data/user/new_analysis/dataset_for_analysis/try_pdb_p command.py:43
Total 464 to query command.py:272
100.0% ββββββββββββββββββββββββββββββββββββββββ 50 of 50 [ 0:00:00 25.40s ]
[11:03:01] Done: 50 command.py:286
32.0% βββββββββββββΈβββββββββββββββββββββββββββ 16 of 50 [ -:--:-- 6.59s ]PeptideLinkingWarning: Possible Peptide Linking: <PDB 4pa0>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code
(/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
PeptideLinkingWarning: Possible Peptide Linking: <PDB 4p7h>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code
(/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
32.0% βββββββββββββΈβββββββββββββββββββββββββββ 16 of 50 [ -:--:-- 14.02s ]PeptideLinkingWarning: Possible Peptide Linking: <PDB 4pa0>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code
(/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
PeptideLinkingWarning: Possible Peptide Linking: <PDB 4p7h>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code
(/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
100.0% ββββββββββββββββββββββββββββββββββββββββ 50 of 50 [ 0:00:00 27.10s ]
[11:03:33] Done: 100 command.py:286
100.0% ββββββββββββββββββββββββββββββββββββββββ 50 of 50 [ 0:00:00 13.41s ]
[11:03:54] Done: 150 command.py:286
66.0% βββββββββββββββββββββββββββΊβββββββββββββ 33 of 50 [ 0:00:18 13.13s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.e1d709a839f44388b3644d1f26bca7f6.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
98.0% ββββββββββββββββββββββββββββββββββββββββΊ 49 of 50 [ 0:00:01 26.40s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.48e47b109518479c9a6a5eef795ff4dc.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
98.0% ββββββββββββββββββββββββββββββββββββββββΊ 49 of 50 [ 0:00:01 39.02s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.75e75f5843d3456a82f052eedd05ebf2.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
98.0% ββββββββββββββββββββββββββββββββββββββββΊ 49 of 50 [ 0:00:01 52.48s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.b77e5416c23a40f783ff75db787b0742.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
98.0% ββββββββββββββββββββββββββββββββββββββββΊ 49 of 50 [ 0:00:01 61.04s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.cd0905dc6881407296a8059d5ba3cbdb.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
98.0% ββββββββββββββββββββββββββββββββββββββββΊ 49 of 50 [ 0:00:01 61.05s ]
Traceback (most recent call last):
File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/validate.py", line 33, in validate
await func(path)
File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/validate.py", line 55, in fasta_load
assert bool(cls.fasta_pat.fullmatch(data))
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/tenacity/_asyncio.py", line 50, in call result = await fn(*args, **kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py", line 69, in wrapper raise e File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py", line 65, in wrapper await ValidateBase.validate(path, suffix=Path(raw_path).suffix) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/validate.py", line 35, in validate raise InvalidFileContentError(path) pdb_profiling.exceptions.InvalidFileContentError: UniProt/fasta/V5T923.fasta.cd0905dc6881407296a8059d5ba3cbdb.tmp
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/user/miniconda3/envs/pp_5/bin/pdb_profiling", line 8, in
0.4.3
: pip install pdb-profiling==0.4.3
I checked data carefully, and Iβm sorry to say thatV5T923
is not in my input Uniprot Accession, so after trying 0.4.3
, the problem still exists.
Then this is due to the outdated data provided by the SIFTS API, Q9H2E6
is likely to be one of your inputs. And SIFTS API provides PDB 6wts
in which V5T923
(chain C) interacts with Q9H2E6
. But now the UniProt has obsoleted V5T923
for some reason and 6wts
Chain C no longer has any Uniprot Accession related to it.
A simple workaround is to drop Q9H2E6
from your inputs when running pipe_select_he
.
Thank you for your patience! I didn't drop Q9H2E6
, but skipped 6wts
. Except V5T923
, another obsoleted Uniprot Accession I3LJZ9
came up. I wonder if there is any solution to completely solve this problem? or could you please tell me how to check the obsoleted Uniprot Accession corresponding to which input Uniprot Accession and pdb id? Then I can skip the wrong pdb id.
pip install pdb-profiling==0.4.4
Version 0.4.4 tries to resolve this problem automatically. No need to skip pdbs and drop uniprots (hopefully).
Besides, PDB-Profiling would raise a PossibleObsoletedUniProtWarning
telling you which UniProt Accession/Isoform is obsoleted. The previous version also reported these ids in InvalidFileContentWarning
. As for locating which uniprot and pdb are related to the obsoleted uniprot, it requires a manual check in
SIFTS Mappings (PDB <-> UniProt all isoforms)
https://www.ebi.ac.uk/pdbe/api/mappings/all_isoforms/:accession
Thank u very much! But now how can I map Protein-Ligand Interaction Pair
through uniprot Uniprot Accession? Is there any methods?
It requires either the bug fixing of PISA API or another handy way of getting protein-ligand interaction data. For the latter, it takes some time to investigate.
Thank you very much for your patient and professional reply these days. I am now fully aware of the situation you mentioned.
The input is the protein UniProt isoform ID. If the pdb-profiling works properly, SIFTS can map and filter the relavant complex PDB structures, and the PISA API can retrieve all protein interface residues. However, the relevant information about the complex is not being outputted, and we have discovered that this is due to the malfunctioning of the PISA API. What I need now is a way to bypass the step involving the PISA API and obtain all other results.