biotite-dev / biotite

A comprehensive library for computational molecular biology
https://www.biotite-python.org
BSD 3-Clause "New" or "Revised" License
682 stars 102 forks source link

Add 3di encoding to `biotite.structure` #665

Closed althonos closed 2 weeks ago

althonos commented 2 months ago

Hi Patrick,

As discussed here is the encoder as taken (and slightly updated) from mini3di.

I am not familiar with the biotite API that much but I tried doing some changes to make it fit more: I added a StructureSequence inheriting Sequence with the 3di alphabet and added support for extracting the coordinates from an AtomArray. This can probably be refactored a bit.

padix-key commented 2 months ago

Thanks for providing your great code and the initial refactoring! As indicated a will further refactor it.

padix-key commented 1 month ago

I pushed some intermediate refactored code. No worries, I am not finished yet 😉 .

padix-key commented 1 month ago

Did you create the reference sequence in test_3di.py with foldssek? And if so, how did you tell foldseek just to create the 3Di sequences without aligning anything? I am asking, because it would be nice to have the 3Di sequences for the existing test structures to run the tests on them

althonos commented 1 month ago

Hi @padix-key, you can generate the 3di sequences for arbitrary PDB files using the makedb and convert2fasta commands of foldseek, see https://github.com/steineggerlab/foldseek/issues/15.

codspeed-hq[bot] commented 1 month ago

CodSpeed Performance Report

Merging #665 will not alter performance

Comparing althonos:feat-3di (80236c2) with main (72226ca)

Summary

✅ 45 untouched benchmarks

padix-key commented 1 month ago

Todo-list as reminder to myself:

padix-key commented 1 month ago

From my side the code is ready, have a look if you like. Thanks again for providing the code from your package! Of course you can also add yourself to CONTRIB.rst in this PR.

One thing I do not fully understand yet, is the invalid state D. Maybe you @althonos have more insights here:

padix-key commented 3 weeks ago

Ping @althonos

padix-key commented 2 weeks ago

I will merge this PR now, as I plan to include this new feature in the upcoming Biotite 1.1 release. However, if you see any room for improvement or like to be added to CONTRIB.rst, do not hesitate to open an issue/PR.

althonos commented 2 weeks ago

@padix-key : Sorry, only seeing this now!

The invalid state being 2 is actually from the original code (see https://github.com/steineggerlab/foldseek/blob/d2d09b588f50d5f8e2fd7a958377a33b2f725415/lib/3di/structureto3di.h#L9); it can also be used for valid states, as it seems to correspond to a coil state.

padix-key commented 2 weeks ago

Thanks for the explanation. I also just merged your CONTRIB.rst addition.