biotite-dev / biotite

A comprehensive library for computational molecular biology
https://www.biotite-python.org
BSD 3-Clause "New" or "Revised" License
641 stars 101 forks source link

chain id parameter for structure.io.pdbx.get_sequence #600

Closed tjmier closed 3 months ago

tjmier commented 3 months ago

Added a new parameter 'chain_id' to the function 'get_sequence' in the module 'biotite.structure.io.pdbx' to return a dictionary mapping chain_id to the sequence

padix-key commented 3 months ago

Thanks for the PR. I would be in favor of dropping the chain_id parameter entirely, and returning always a dictionary. Since Biotite has not reached 1.0, yet, this backwards-incompatible changes would be OK in my opinion.

Note that this would require a few small adjustments of that function call in the tests and documentation (searching for pdbx.get_sequence in the code base should find all occurences)

t0mdavid-m commented 3 months ago

I agree with @padix-key. I think we should drop the chain_id parameter.

tjmier commented 3 months ago

I sent an additional commit with the suggested changes as I understand them. It is worth noting that there may be instances where entity_poly.pdbx_strand_id is not equivalent to atom_site.label_asym_id. The structure used in the test function (PDB:5UGO) is an example of this where the strand ID is "T" and the asym_id is "A". This may be an edge case and I am unsure if it will cause issues but its worth mentioning.

padix-key commented 3 months ago

Thanks for the changes. There are still remaining get_sequence() calls in

The strand ID is not matching atom_site.label_asym_id, but atom_site.auth_asym_id is. I am not sure what the rationale behind this is though, as the label_xxx fields should be the source of truth in the PDB.

tjmier commented 3 months ago

Please ignore those commits. I am still learning git and did not think those commits would be sent to this pull request. Again, apologies for silly mistakes. A lot of this is new to me.

padix-key commented 3 months ago

No problem. You may also convert this PR to a Draft PR to further indicate that this is work in progress

tjmier commented 3 months ago

The last commit is the one I would like to merge if possible.

padix-key commented 3 months ago

Looks good to me, however, one small adjustment is necessary: With the merge of #552 residue_coevolution.py is moved to doc/examples/scripts/sequence/homology/residue_coevolution.py. This merge conflict needs to be solved first. Furthermore, you may add yourself to CONTRIB.rst, if you like.