althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
129 stars 12 forks source link

Save the DigitalSequenceBlock as pickle #56

Closed chtsai0105 closed 1 year ago

chtsai0105 commented 1 year ago

Hi Martin!

In one of my project, I would like to save a list of pyhmmer.easel.DigitalSequenceBlock to a pickle file as an intermediate data. However, an error occurred that state TypeError: no default __reduce__ due to non-trivial __cinit__. Are there any other way to save the pyhmmer.easel.DigitalSequenceBlock as a file which can be loaded later?

In terms of the (sub)sequences retrieval, we were using esl-sfetch from original hmmer suit to do this previously. I also noticed there is a pyhmmer.easel.SSIReader and pyhmmer.easel.SSIWriter class but I'm not quite sure whether this is something related to the esl-sfetch?

I would also like to report a glitch on the webpage. I was not able to reach the documentation page in the latest version 0.10.3 - when I click on the tab it always lead me back to the landing page.

althonos commented 1 year ago

Hi @chtsai0105

In one of my project, I would like to save a list of pyhmmer.easel.DigitalSequenceBlock to a pickle file as an intermediate data. However, an error occurred that state TypeError: no default __reduce__ due to non-trivial __cinit__. Are there any other way to save the pyhmmer.easel.DigitalSequenceBlock as a file which can be loaded later?

I have pushed a new release with pickle protocol support for all sequence objects so that you can save the intermediate objects. Beforehand you could save them to FASTA format and load them again later but that would cause some metadata to be lost.

In terms of the (sub)sequences retrieval, we were using esl-sfetch from original hmmer suit to do this previously. I also noticed there is a pyhmmer.easel.SSIReader and pyhmmer.easel.SSIWriter class but I'm not quite sure whether this is something related to the esl-sfetch?

The SSI objects allow reading and writing a Sequence-Subsequence Index file, which is the index used by esl-sfetch; however there is no way to use it for sequence retrieval at the moment. However, you could use any other library (such as pyfaidx) to handle subsequence retrieval, and then convert them to a TextSequence which is quite cheap to do.

I would also like to report a glitch on the webpage. I was not able to reach the documentation page in the latest version 0.10.3 - when I click on the tab it always lead me back to the landing page.

I had the same issue but it was only occuring on the 0.10.3 version, not in the latest branch, so hopefully this is fixed with release v0.10.4.

chtsai0105 commented 1 year ago

@althonos Thanks! I'll try the latest version to see if I can export the objects as pickle.

For the webpage issue, I found the latest 0.10.4 still not able to show the documentation page. On my end the latest and stable branches both have the same issue as 0.10.4 does. I've cleared the cookies but still the same.

althonos commented 1 year ago

For the webpage issue, I found the latest 0.10.4 still not able to show the documentation page. On my end the latest and stable branches both have the same issue as 0.10.4 does. I've cleared the cookies but still the same.

I still have this but only with the dropdown menu at the top; clicking on the "API Reference" link on the homepage seems to work 😃

chtsai0105 commented 1 year ago

Have confirmed that the pickle protocol worked really well! I'm closing the issue since the major problem have been resolved. Need to keep tracking on the documentation issue since it still remains.

althonos commented 1 year ago

I think I fixed it in the latest branch, let me know if it works too: https://pyhmmer.readthedocs.io/en/latest/

chtsai0105 commented 1 year ago

The latest branch is working now! Thank you so much!