`parse_cif_noX.py` misses some chains in CATH?

dauparas / ProteinMPNN

Code for the ProteinMPNN paper

MIT License

1.05k stars 307 forks source link

`parse_cif_noX.py` misses some chains in CATH? #82

Open tyang816 opened 11 months ago

tyang816 commented 11 months ago

Hi, thanks for sharing your great work. I'm having some difficulties when processing the CATH data with parse_cif_noX.py. For example, when using the data split file in GVP, "4mxw.R" is required, but the code provides A to L. Such situations (lacking some chains) exist in large numbers in the cif data points downloaded from CATH. How should I handle it correctly? Or can you please provide the processed CATH training dataset directly? Thank you so much.

imSeaton commented 5 months ago

你好呀，感谢你的ProtSSN和洪亮团队的其他工作。他的parse_cif_noX.py处理后用的chain_ID是asym_chain_ID，相当于PDB官方给的ID，而你说的4mxw.R这里的“R”是结构生物学家存入PDB中的author_chain_ID（可能也是CATH文件中用的chain_ID系统）。当然两者之间的转换也简单。这个parse_cif_noX.py中data.getObj('pdb_poly_seq_scheme'）附近就有author_chain_ID --> asym_chain_ID之间的映射处理。我猜想你反过来搞个字典映射下就能把数据转成你想要的author_chain_ID形式了。希望对你有帮助。

tyang816 commented 5 months ago

你好呀，感谢你的ProtSSN和洪亮团队的其他工作。他的parse_cif_noX.py处理后用的chain_ID是asym_chain_ID，相当于PDB官方给的ID，而你说的4mxw.R这里的“R”是结构生物学家存入PDB中的author_chain_ID（可能也是CATH文件中用的chain_ID系统）。当然两者之间的转换也简单。这个parse_cif_noX.py中data.getObj('pdb_poly_seq_scheme'）附近就有author_chain_ID --> asym_chain_ID之间的映射处理。我猜想你反过来搞个字典映射下就能把数据转成你想要的author_chain_ID形式了。希望对你有帮助。

谢谢呀，感谢你的帮助，时间有点久了我有点忘了都..我后来好像通过maxit软件曲线救国解决了这个问题