Closed zpengmei closed 4 months ago
Thank you @zpengmei ! We are looking into it.
Carlos
Hello @zpengmei could you share the code for count_segments
so I can fully reproduce your code? Thanks!
Hi again @zpengmei ! I took a closer look at your issue.
From what I can tell there doesn't seem to be a problem but since I don't have your full code I can't reproduce exactly what you have.
Taking the same protein ID '1KG6'
and checking the length of the residue type, residue number and sequence identities I get the following:
This code:
from proteinshake.datasets import EnzymeCommissionDataset
dataset = EnzymeCommissionDataset()
proteins = dataset.proteins(resolution='atom')
for p in proteins:
if p['protein']['ID'] == '1KG6':
print('seq -20:', p['protein']['sequence'][-20:])
print('seq length:', len(p['protein']['sequence']))
print('resnum: ', len(p['atom']['residue_number']))
print('restype: ', len(p['atom']['residue_type']))
break
This output:
seq -20: QNGCIAAANNSWALYPGKKP
seq length: 225
resnum: 1785
restype: 1785
As you can see the residue number and type lists are of the same length (1785). The repeated 'K' entries in residue type are because the protein has two 'K' amino acids consecutively followed by a 'P'. This is expected to happen naturally.
Unless I am missing something everything appears to be in order.
Feel free to reach out if you have any further issues.
Best, Carlos
Hi Carlos,
Thank you so much for your time! Sry for the late reply, I think I missed it there there two 'K'.
Best, Zihan
Hi team,
Thanks for this helpful repository! I am trying to load some protein datasets here, and there seems be to some discrepancy between the atom-level residue type and numbers that assign atoms to residues. Here is my test function:
Here is the output:
When looking at the last 20 entries of residue_number and residue type, it seems not matching, like 222 and 223 all refer to K, is this something specific to this dataset?
Thanks! Zihan