NatureGeorge / pdb-profiling

Profiling Protein Structures from Protein Data Bank and integrate various resources.πŸ„β€β™‚οΈ
https://pdb-profiling.netlify.app/
MIT License
9 stars 0 forks source link

PISA API doesn't work, how to bypass this step #30

Open yang-arina opened 1 year ago

yang-arina commented 1 year ago

The input is the protein UniProt isoform ID. If the pdb-profiling works properly, SIFTS can map and filter the relavant complex PDB structures, and the PISA API can retrieve all protein interface residues. However, the relevant information about the complex is not being outputted, and we have discovered that this is due to the malfunctioning of the PISA API. What I need now is a way to bypass the step involving the PISA API and obtain all other results.

NatureGeorge commented 1 year ago

For monomers, related functions and command-lines do not involve any PISA queries and thus can retrieve results successfully. For multimers, however, the PISA API call is a crucial step since it is the source of inter-chain relationships and it would take me some time to write alternative code for it.

I sent a mail to the PDBe team and received their replies on Mar 8. They said that they are actively looking to improve this service and will have something soon that should be better. But it seems that it still requires weeks to months for them to handle this.

I will notify you if I finish an alternative way that substitutes for PISA API to get the inter-chain interaction info.

NatureGeorge commented 1 year ago

Please let me know if the multimer-related results are important and necessary to you.

yang-arina commented 1 year ago

Thank you for your reply, as for our research, if it is currently not possible to obtain the interface information for PISA, it is crucial to skip this step and obtain the complex structure directly.

NatureGeorge commented 1 year ago

SIFTS API does not provide any chain interaction information. PISA API provides both the interacting chains and their corresponding interacting residues (i.e. interface residues).

I am looking for alternatives to PISA API for chain interaction information.

NatureGeorge commented 1 year ago

I managed to make pipe_select_ho and pipe_select_he work again, without calling PISA API. You can try the new version: pip install pdb-profiling==0.4.1.

Type Args
Monomer sifts-mapping --func pipe_select_mo (default)
Homodimer sifts-mapping --func pipe_select_ho
Heterodimer sifts-mapping --func pipe_select_he
Protein-Ligand Interaction Pair (Obsoleted due to the bug of PISA API) sifts-mapping --func pipe_select_else --kwargs '{"func": "pipe_protein_ligand_interface", "focus_assembly_ids": (0,)}'
Protein-Nucleotide Interaction Pair(Obsoleted due to the bug of PISA API) sifts-mapping --func pipe_select_else --kwargs '{"func": "pipe_protein_nucleotide_interface"}'
Protein-Nucleotide Interaction Pair (New) sifts-mapping --func pipe_select_else --kwargs 'func="Protein/NA"'
NatureGeorge commented 1 year ago

PDB-Profiling 0.4.1 make use of the RCSB Data API and RCSB Search API to retrieve the chain interaction information, as well as the interface residues. They only provide the protein-protein and protein-nucleotide acid interactions of biological units (no asymmetric unit). The interface residues are calculated by biojava-structure-6.0.5.

See https://data.rcsb.org/redoc/index.html#tag/Interface-Service/operation/getInterfaceById

e.g. https://data.rcsb.org/rest/v1/core/interface/{entry_id}/{assembly_id}/{interface_id}

yang-arina commented 1 year ago

Thank you very much for your help! But there are still some issues regarding how to get Protein-Ligand Interaction Pair data in the new way.

yang-arina commented 1 year ago
  1. my command is:

pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_ho --output test1_result_ho.txt test1.txt contains the uniprot id error like:

[16:57:27] Initializing Folder: /data/user/new_analysis/dataset_for_analysis/try_pdb_p                                                                                                                                      command.py:43
           Total 10 to query                                                                                                                                                                                                  command.py:272
70.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━ 7 of 10 [ -:--:-- 227.61s ]
Traceback (most recent call last):
  File "/data/user/miniconda3/envs/pp_4/bin/pdb_profiling", line 8, in <module>
    sys.exit(Interface())
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    rv.append(sub_ctx.command.invoke(sub_ctx))
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/commands/command.py", line 275, in sifts_mapping
    res = SIFTSs(ids[i:i+chunksize]).fetch(func, **kwargs).run(p.track).result()
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/unsync/unsync.py", line 117, in result
    return self.concurrent_future.result(*args, **kwargs)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 4176, in run
    return [await fob for fob in tqdm(as_completed(self.tasks), total=len(self.tasks))]
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 4176, in <listcomp>
    return [await fob for fob in tqdm(as_completed(self.tasks), total=len(self.tasks))]
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/asyncio/tasks.py", line 619, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3633, in pipe_select_ho
    p_df = await self.pipe_select_ho_base(**kwargs)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3583, in pipe_select_ho_base
    p_df = await self.retrieve_rcsb_interface('ho', sele_df, chain_pairs=chain_pairs, **kwargs)
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3928, in retrieve_rcsb_interface
    interfaces_dfs = [await i for i in ob.tasks]
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3928, in <listcomp>
    interfaces_dfs = [await i for i in ob.tasks]
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 1521, in pipe_interface_res_info_for_rcsb
    interface_df = await assembly.get_interface_info_from_rcsb_data_api()
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 1906, in get_interface_info_from_rcsb_data_api
    profile_df = (await PDB(self.pdb_id).profile_id()).query(f'assembly_id == {self.assembly_id}')
  File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 1320, in profile_id
    assert profile_lyst == ass_lyst, f"\n{self.pdb_id}\n{assembly_id}\n{entity_id}\n{profile_lyst},\n{ass_lyst}"
AssertionError: 
2v5w
1
3
('D',),
('F', 'G')
  1. A similar situation occurred in he.
  2. As for NA, conmand line: pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_else --kwargs '{"func": "Protein/NA"}' --output test1_result_na.txt error:
    [17:19:13] Initializing Folder: /data/user/new_analysis/dataset_for_analysis/try_pdb_p                                                                                                                                      command.py:43
    Traceback (most recent call last):
    File "/data/user/miniconda3/envs/pp_4/bin/pdb_profiling", line 8, in <module>
    sys.exit(Interface())
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    rv.append(sub_ctx.command.invoke(sub_ctx))
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
    File "/data/user/miniconda3/envs/pp_4/lib/python3.8/site-packages/pdb_profiling/commands/command.py", line 207, in sifts_mapping
    kwargs = dict(sub.split('=') for item in kwargs for sub in item.split(';'))
    ValueError: dictionary update sequence element #0 has length 1; 2 is required
NatureGeorge commented 1 year ago

1 (and 2). try pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_ho --output test1_result_ho.txt test1.txt contains the uniprot id --skip_pdbs '2v5w' (There is something wrong with the API data related to 2v5w.)

  1. try pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_else --kwargs 'func="Protein/NA"' --output test1_result_na.txt
yang-arina commented 1 year ago
  1. --skip_pdbs '2v5w', this operation does work, but the problematic pdb structures just like a bottomless hole, with one solved and another followed. Is there any way to solve this problem from the root?
  2. After trying: pdb_profiling sifts-mapping --input test1.txt --column unp_canonical_id --func pipe_select_else --kwargs 'func="Protein/NA"' --output test1_result_na.txt, the same problem occurred as above.
NatureGeorge commented 1 year ago

Try pip install pdb-profiling==0.4.2. And you do not have to add --skip_pdbs '2v5w'.

NatureGeorge commented 1 year ago

Try pip install pdb-profiling==0.4.2. And you do not have to add --skip_pdbs '2v5w'.

Resend this message cause I first wrongly typed 0.4.1 in my last response and edited it to 0.4.2 on GitHub Issue. Email can not track that change.

yang-arina commented 1 year ago

Thank u for reply! I have tried pip install pdb-profiling==0.4.2, but there still remains some questions:

  1. I still added--skip_pdbs because some wrong pdb ids like '7xw5','7xw6','7o9t','7o9x','7o9z','7oa9','8ig0'(and so on) still exist. After that, the mapping command lines for mo, ho, na have run successfully.
  2. When I run: pdb_profiling sifts-mapping --input canonical_id_0328.txt --column unp_canonical_id --func pipe_select_he --output canonical_result_he.txt --skip_pdbs '7xw5','7xw6','7o9t','7o9x','7o9z','7oa9','8ig0','7uzp','7wjq','7tnh','7t1u','7vvl','7sck','7y9z','8dm5' there are some new problems:
    
    (pp_5) [user@bogon try_pdb_p]$ pdb_profiling sifts-mapping --input canonical_id_0328.txt --column unp_canonical_id --func pipe_select_he --output canonical_result_he.txt --skip_pdbs '7xw5','7xw6','7o9t','7o9x','7o9z','7oa9','8ig0','7uzp','7wjq','7tnh','7t1u','7vvl','7sck','7y9z','8dm5'
    [11:02:36] Initializing Folder: /data/user/new_analysis/dataset_for_analysis/try_pdb_p                                                                                                                                      command.py:43
           Total 464 to query                                                                                                                                                                                                 command.py:272
    100.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50 of 50 [ 0:00:00 25.40s ]
    [11:03:01] Done: 50                                                                                                                                                                                                           command.py:286
    32.0% ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16 of 50 [ -:--:-- 6.59s ]PeptideLinkingWarning: Possible Peptide Linking: <PDB 4pa0>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code 
    (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
    PeptideLinkingWarning: Possible Peptide Linking: <PDB 4p7h>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code 
    (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
    32.0% ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16 of 50 [ -:--:-- 14.02s ]PeptideLinkingWarning: Possible Peptide Linking: <PDB 4pa0>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code 
    (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
    PeptideLinkingWarning: Possible Peptide Linking: <PDB 4p7h>, {'entity_id': 1}, {'three_letter_code': 'CRO', 'parent_chem_comp_ids': ['THR', 'TYR', 'GLY'], 'one_letter_code': 'TYG'}; select the first code 
    (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py:1077)
    100.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50 of 50 [ 0:00:00 27.10s ]
    [11:03:33] Done: 100                                                                                                                                                                                                          command.py:286
    100.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50 of 50 [ 0:00:00 13.41s ]
    [11:03:54] Done: 150                                                                                                                                                                                                          command.py:286
    66.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 33 of 50 [ 0:00:18 13.13s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.e1d709a839f44388b3644d1f26bca7f6.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
    98.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 49 of 50 [ 0:00:01 26.40s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.48e47b109518479c9a6a5eef795ff4dc.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
    98.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 49 of 50 [ 0:00:01 39.02s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.75e75f5843d3456a82f052eedd05ebf2.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
    98.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 49 of 50 [ 0:00:01 52.48s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.b77e5416c23a40f783ff75db787b0742.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
    98.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 49 of 50 [ 0:00:01 61.04s ]InvalidFileContentWarning: InvalidFileContentError for 'UniProt/fasta/V5T923.fasta.cd0905dc6881407296a8059d5ba3cbdb.tmp', will retry (/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py:67)
    98.0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 49 of 50 [ 0:00:01 61.05s ]
    Traceback (most recent call last):
    File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/validate.py", line 33, in validate
    await func(path)
    File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/validate.py", line 55, in fasta_load
    assert bool(cls.fasta_pat.fullmatch(data))
    AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/tenacity/_asyncio.py", line 50, in call result = await fn(*args, **kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py", line 69, in wrapper raise e File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/ensure.py", line 65, in wrapper await ValidateBase.validate(path, suffix=Path(raw_path).suffix) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/validate.py", line 35, in validate raise InvalidFileContentError(path) pdb_profiling.exceptions.InvalidFileContentError: UniProt/fasta/V5T923.fasta.cd0905dc6881407296a8059d5ba3cbdb.tmp

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/user/miniconda3/envs/pp_5/bin/pdb_profiling", line 8, in sys.exit(Interface()) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/click/core.py", line 1688, in invoke rv.append(sub_ctx.command.invoke(sub_ctx)) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(args, kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), args, kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/commands/command.py", line 275, in sifts_mapping res = SIFTSs(ids[i:i+chunksize]).fetch(func, kwargs).run(p.track).result() File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/unsync/unsync.py", line 117, in result return self.concurrent_future.result(args, kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/data/user/miniconda3/envs/pp_5/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 4179, in run return [await fob for fob in tqdm(as_completed(self.tasks), total=len(self.tasks))] File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 4179, in return [await fob for fob in tqdm(as_completed(self.tasks), total=len(self.tasks))] File "/data/user/miniconda3/envs/pp_5/lib/python3.8/asyncio/tasks.py", line 619, in _wait_for_one return f.result() # May raise f.exception(). File "/data/user/miniconda3/envs/pp_5/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable return (yield from awaitable.await()) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3764, in pipe_select_he p_df = await self.pipe_select_he_base(kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3683, in pipe_select_he_base sele_df = await self.pipe_select_mo(exclude_pdbs=exclude_pdbs, complete_chains=True, skip_pdbs=skip_pdbs, select_mo_kwargs=select_mo_kwargs, skip_carbohydrate_polymer=skip_carbohydrate_polymer) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3387, in pipe_select_mo sele_df = await self.pipe_select_base(exclude_pdbs, kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3339, in pipe_select_base res = await self.pipe_score(*kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3101, in pipe_score sifts_df = await self.pipe_base(complete_chains=complete_chains, skip_pdbs=skip_pdbs, skip_carbohydrate_polymer=skip_carbohydrate_polymer) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 3088, in pipe_base sifts_df = await self.reformat(init_task File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/unsync/unsync.py", line 130, in then return await result File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 2816, in add_residue_conflict f_dfrm['conflict_pdb_index'], f_dfrm['raw_pdb_index'], f_dfrm['conflict_pdb_range'], f_dfrm['conflict_unp_range'], f_dfrm['unp_len'] = zip([await i for i in tasks]) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 2816, in f_dfrm['conflict_pdb_index'], f_dfrm['raw_pdb_index'], f_dfrm['conflict_pdb_range'], f_dfrm['conflict_unp_range'], f_dfrm['unp_len'] = zip([await i for i in tasks]) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/pdb_profiling/processors/pdbe/record.py", line 2778, in get_residueconflict , unp_seq = await cls.fetch_unp_fasta(UniProt) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/unsync/unsync.py", line 127, in then await self File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped return await fn(args, **kwargs) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/tenacity/_asyncio.py", line 47, in call do = self.iter(retry_state=retry_state) File "/data/user/miniconda3/envs/pp_5/lib/python3.8/site-packages/tenacity/init__.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7fa0080f3a90 state=finished raised InvalidFileContentError>]

NatureGeorge commented 1 year ago
  1. upgrade to 0.4.3: pip install pdb-profiling==0.4.3
  2. One of your input Uniprot Accession is obsoleted (V5T923).
image
yang-arina commented 1 year ago

I checked data carefully, and Iβ€˜m sorry to say thatV5T923is not in my input Uniprot Accession, so after trying 0.4.3, the problem still exists.

NatureGeorge commented 1 year ago

Then this is due to the outdated data provided by the SIFTS API, Q9H2E6 is likely to be one of your inputs. And SIFTS API provides PDB 6wts in which V5T923 (chain C) interacts with Q9H2E6. But now the UniProt has obsoleted V5T923 for some reason and 6wts Chain C no longer has any Uniprot Accession related to it.

A simple workaround is to drop Q9H2E6 from your inputs when running pipe_select_he.

yang-arina commented 1 year ago

Thank you for your patience! I didn't drop Q9H2E6, but skipped 6wts. Except V5T923, another obsoleted Uniprot Accession I3LJZ9came up. I wonder if there is any solution to completely solve this problem? or could you please tell me how to check the obsoleted Uniprot Accession corresponding to which input Uniprot Accession and pdb id? Then I can skip the wrong pdb id.

NatureGeorge commented 1 year ago

pip install pdb-profiling==0.4.4

Version 0.4.4 tries to resolve this problem automatically. No need to skip pdbs and drop uniprots (hopefully).

Besides, PDB-Profiling would raise a PossibleObsoletedUniProtWarning telling you which UniProt Accession/Isoform is obsoleted. The previous version also reported these ids in InvalidFileContentWarning. As for locating which uniprot and pdb are related to the obsoleted uniprot, it requires a manual check in

SIFTS Mappings (PDB <-> UniProt all isoforms)
https://www.ebi.ac.uk/pdbe/api/mappings/all_isoforms/:accession

(https://www.ebi.ac.uk/pdbe/api/doc/sifts.html)

yang-arina commented 1 year ago

Thank u very much! But now how can I map Protein-Ligand Interaction Pair through uniprot Uniprot Accession? Is there any methods?

NatureGeorge commented 1 year ago

It requires either the bug fixing of PISA API or another handy way of getting protein-ligand interaction data. For the latter, it takes some time to investigate.

yang-arina commented 1 year ago

Thank you very much for your patient and professional reply these days. I am now fully aware of the situation you mentioned.