althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
128 stars 12 forks source link

Save sequence name after translation #31

Closed valentynbez closed 1 year ago

valentynbez commented 1 year ago

Previous behaviour:

with pyhmmer.easel.SequenceFile("COG0012.fna", digital=True) as seqs_file:
    nucleotides = seqs_file.read_block()

print(nucleotides[0].name)
# DigitalSequenceBlock method 
proteins = nucleotides.translate()
print(proteins[0].name)
# DigitalSequence method
print(nucleotides[0].translate().name)
b'1089456.NCGM2_0913'
b''
b''

Current behaviour:

b'1089456.NCGM2_0913'
b'1089456.NCGM2_0913'
b'1089456.NCGM2_0913'
codecov[bot] commented 1 year ago

Codecov Report

Base: 84.77% // Head: 84.93% // Increases project coverage by +0.15% :tada:

Coverage data is based on head (20015a5) compared to base (96b9b29). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #31 +/- ## ========================================== + Coverage 84.77% 84.93% +0.15% ========================================== Files 39 39 Lines 10906 10924 +18 ========================================== + Hits 9246 9278 +32 + Misses 1660 1646 -14 ``` | Flag | Coverage Δ | | |---|---|---| | CPython | `84.93% <100.00%> (+0.18%)` | :arrow_up: | | Linux | `84.93% <100.00%> (+0.18%)` | :arrow_up: | | OSX | `?` | | | v3.10 | `84.93% <100.00%> (+0.18%)` | :arrow_up: | | v3.11 | `95.66% <100.00%> (+<0.01%)` | :arrow_up: | | v3.7 | `84.70% <100.00%> (+0.17%)` | :arrow_up: | | v3.8 | `84.93% <100.00%> (+0.18%)` | :arrow_up: | | v3.9 | `84.93% <100.00%> (+0.18%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/althonos/pyhmmer/pull/31?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde) | Coverage Δ | | |---|---|---| | [pyhmmer/easel.pyx](https://codecov.io/gh/althonos/pyhmmer/pull/31?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde#diff-cHlobW1lci9lYXNlbC5weXg=) | `85.31% <100.00%> (+0.67%)` | :arrow_up: | | [pyhmmer/tests/test\_easel/test\_block.py](https://codecov.io/gh/althonos/pyhmmer/pull/31?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde#diff-cHlobW1lci90ZXN0cy90ZXN0X2Vhc2VsL3Rlc3RfYmxvY2sucHk=) | `100.00% <100.00%> (ø)` | | | [pyhmmer/tests/test\_doctest.py](https://codecov.io/gh/althonos/pyhmmer/pull/31?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde#diff-cHlobW1lci90ZXN0cy90ZXN0X2RvY3Rlc3QucHk=) | `85.07% <0.00%> (-4.48%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Martin+Larralde)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

althonos commented 1 year ago

Thanks! Actually, I think it may be worth copying the entire metadata (name / description / source), do you think you could update the code there as well?

valentynbez commented 1 year ago

Running test_block.py now throws following error:

 File "/nfs/cds-peta/exports/biol_micro_cds_gr_sunagawa/scratch/vbezshapkin/pyhmmer/pyhmmer/tests/test_easel/test_block.py", line 314, in test_translate
    prots = block.translate().textize()
  File "pyhmmer/easel.pyx", line 5827, in pyhmmer.easel.DigitalSequenceBlock.translate
  File "pyhmmer/easel.pyx", line 5884, in pyhmmer.easel.DigitalSequenceBlock.translate
  File "pyhmmer/easel.pyx", line 5078, in pyhmmer.easel.DigitalSequence.__init__
  File "pyhmmer/easel.pyx", line 4679, in pyhmmer.easel.Sequence.taxonomy_id.__set__
ValueError: NCBI taxonomy must be a positive integer or None, got -1

Don't know how to solve it

valentynbez commented 1 year ago

The taxonomy_id is lost after TextSequence.digitize()

pyhmmer.easel.TextSequence(b"woop", sequence="AGTATC", taxonomy_id=123).digitize(pyhmmer.easel.Alphabet.dna()).taxonomy_id

Produces empty output.

valentynbez commented 1 year ago

Now taxonomy_id is passed everywhere, but the following errors appear because of object initialization:

======================================================================
ERROR: test_len (test_block.TestDigitalSequenceBlock)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/nfs/cds-peta/exports/biol_micro_cds_gr_sunagawa/scratch/vbezshapkin/pyhmmer/pyhmmer/tests/test_easel/test_block.py", line 37, in test_len
    seq1 = self._new_sequence(b"seq1", "ATGC")
  File "/nfs/cds-peta/exports/biol_micro_cds_gr_sunagawa/scratch/vbezshapkin/pyhmmer/pyhmmer/tests/test_easel/test_block.py", line 287, in _new_sequence
    ).digitize(self.alphabet)
  File "pyhmmer/easel.pyx", line 4884, in pyhmmer.easel.TextSequence.digitize
  File "pyhmmer/easel.pyx", line 4909, in pyhmmer.easel.TextSequence.digitize
  File "pyhmmer/easel.pyx", line 4679, in pyhmmer.easel.Sequence.taxonomy_id.__set__
ValueError: NCBI taxonomy must be a positive integer or None, got 0
althonos commented 1 year ago

@valentynbez : I have fixed the initialization of taxonomy IDs, so at least the tests pass now. I think it's a bug that esl_sq_Copy doesn't copy the taxonomy ID as well, so we should open a PR on https://github.com/EddyRivasLab/easel as well. Want to do it? :wink:

valentynbez commented 1 year ago

At this point, it's reasonable just to drop support for tax_id in this case: https://github.com/EddyRivasLab/easel/issues/68

althonos commented 1 year ago

Good call. Then you can just remove it from the test cases and we should be good to go :)

althonos commented 1 year ago

All clear, looks like it's just a Codecov problem failing CI. I'll be merging. Thanks for the contribution :ok_hand:

valentynbez commented 1 year ago

Always! I'm gonna use it A LOT :D