NatureGeorge / pdb-profiling

Profiling Protein Structures from Protein Data Bank and integrate various resources.🏄‍♂️
https://pdb-profiling.netlify.app/
MIT License
9 stars 0 forks source link

RCSB Search API: rcsb_cluster_membership -> 100 identity but with different `pdbx_seq_one_letter_code` #6

Open NatureGeorge opened 3 years ago

NatureGeorge commented 3 years ago

Describe the bug

as title said

To Reproduce

PDB('2d4q').rcsb_cluster_membership(entity_id=1, identity_cutoff=100).result()
rcsb_id score cluster_id identity pdb_id entity_id
0 2D4Q_1 1.0 34283 100 2d4q 1
1 2E2X_1 1.0 34283 100 2e2x 1
>>> PDB('2d4q').get_sequence(entity_id=1, mode='raw_pdb_seq').result()
>>> 'KEEFKALKTLSIFYQAGTSKAGNPIFYYVARRFKTGQINGDLLIYHVLLTLKPYYAKPYEIVVDLTHTGPSNRFKTDFLSKWFVVFPGFAYDNVSAVYIYNCNSWVREYTKYHERLLTGLKGSKRLVFIDCPGKLAEHIEHEQQKLPAATLALEEDLKVFHNALKLAHKDTKVSIKVGSTAVQVTSAERTKVLGQSVFLNDIYYASEIEEICLVDENQFTLTIANQGTPLTFMHQECEAIVQSIIHIRTRWELSQPD'
>>> PDB('2e2x').get_sequence(entity_id=1, mode='raw_pdb_seq').result()
>>> 'GAMTGSSKFEEFMTRHQVHEKEEFKALKTLSIFYQAGTSKAGNPIFYYVARRFKTGQINGDLLIYHVLLTLKPYYAKPYEIVVDLTHTGPSNRFKTDFLSKWFVVFPGFAYDNVSAVYIYNCNSWVREYTKYHERLLTGLKGSKRLVFIDCPGKLAEHIEHEQQKLPAATLALEEDLKVFHNALKLAHKDTKVSIKVGSTAVQVTSAERTKVLGQSVFLNDIYYASEIEEICLVDENQFTLTIANQGTPLTFMHQECEAIVQSIIHIRTRWELSQPD'
>>> PDB('2e2x').stats_protein_entity_seq().result()[0].ARTIFACT_INDEX
>>> [[1, 5]]

Expected behavior

same pdbx_seq_one_letter_code


NatureGeorge commented 3 years ago
WithoutRCSBClusterMembershipWarning: polymer_entity(entry_id: "2lob", entity_id: "2") -> 
{'data': {'polymer_entity': {'rcsb_cluster_membership': None}}} 
(C:\GitWorks\pdb-profiling\pdb_profiling\processors\pdbe\record.py:1396)
NatureGeorge commented 3 years ago

cluster_id would change

cluster_id entity_id identity pdb_id rcsb_id score
32101 1 100 2d4q 2D4Q_1 1.0
32101 1 100 2e2x 2E2X_1 1.0