Closed althonos closed 2 weeks ago
Thanks for providing your great code and the initial refactoring! As indicated a will further refactor it.
I pushed some intermediate refactored code. No worries, I am not finished yet 😉 .
Did you create the reference sequence in test_3di.py
with foldssek
? And if so, how did you tell foldseek
just to create the 3Di sequences without aligning anything? I am asking, because it would be nice to have the 3Di sequences for the existing test structures to run the tests on them
Hi @padix-key, you can generate the 3di sequences for arbitrary PDB files using the makedb
and convert2fasta
commands of foldseek, see https://github.com/steineggerlab/foldseek/issues/15.
Comparing althonos:feat-3di
(80236c2) with main
(72226ca)
✅ 45
untouched benchmarks
Todo-list as reminder to myself:
foldseek
citationdoc/apidoc.json
From my side the code is ready, have a look if you like. Thanks again for providing the code from your package! Of course you can also add yourself to CONTRIB.rst
in this PR.
One thing I do not fully understand yet, is the invalid state D
. Maybe you @althonos have more insights here:
D
reserved for the invalid state (i.e. missing atoms) or is there also a valid structure which results in D
?Ping @althonos
I will merge this PR now, as I plan to include this new feature in the upcoming Biotite 1.1 release. However, if you see any room for improvement or like to be added to CONTRIB.rst
, do not hesitate to open an issue/PR.
@padix-key : Sorry, only seeing this now!
The invalid state being 2 is actually from the original code (see https://github.com/steineggerlab/foldseek/blob/d2d09b588f50d5f8e2fd7a958377a33b2f725415/lib/3di/structureto3di.h#L9); it can also be used for valid states, as it seems to correspond to a coil state.
Thanks for the explanation. I also just merged your CONTRIB.rst
addition.
Hi Patrick,
As discussed here is the encoder as taken (and slightly updated) from
mini3di
.I am not familiar with the biotite API that much but I tried doing some changes to make it fit more: I added a
StructureSequence
inheritingSequence
with the 3di alphabet and added support for extracting the coordinates from anAtomArray
. This can probably be refactored a bit.