Closed jpjarnoux closed 8 months ago
Hi Jérôme,
The callback needs to take two arguments, the HMM
object and the total number of currently loaded HMMs (useful in case you're reading the HMMs from a file, in which case the total is not known in advance and you can update it, tqdm
doesn't support that but rich
does).
In your snippet, that means:
options = {"bit_cutoffs": bit_cutoffs, 'callback': lambda hmm, total: bar.update()}
If i use only one argument like you did the progress bar is never updated, but since the exception is silenced the code enters a deadlock (the worker threads die on the exception, while the main thread still tries to pass them queries to process).
I've patched the deadlock, so now with the code above you'd actually get the error and traceback:
0%| | 0/20795 [00:00<?, ?hmm/s]Traceback (most recent call last):
File "/home/althonos/Code/pyhmmer/issue.py", line 18, in <module>
for top_hits in pyhmmer.hmmsearch(hmms, sequences, cpus=2, callback=callback):
File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 520, in _multi_threaded
yield results[0].get()
^^^^^^^^^^^^^^^^
File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 122, in get
raise self.exception
File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 215, in run
hits = self.process(chore.query)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 232, in process
self.callback(query, self.query_count.value) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: <lambda>() takes 1 positional argument but 2 were given
I'll publish a patch shortly for the deadlock issue, but you don't need to wait for it, just to change the callback
signature for your code to work :+1:
Hi, Thank you very much for your quick reply.
In my case, I have only one HMM per file, so I assume I could consider the length of my pyhmmer.plan7.HMM
list as the total number of HMM.
Could you say how I could get the HMM
object from the TopHits
or Hit
object? It's not clear to me.
You basically have two choices:
The callback function has signature callback(query, total)
, and in the case of hmmsearch
the query is the HMM
object, so you could do have the following:
def callback(hmm, total):
logging.info("Finished annotation with HMM %s", hmm.name.decode())
pbar.update()
for top_hits in pyhmmer.hmmsearch(hmms, sequences, callback=callback):
# ... #
The hmmsearch
function is guaranteed to return one TopHits
object per query, in the same order, so you can just use zip
with your queries and your TopHits
:
for hmm, top_hits in zip(hmms, pyhmmer.hmmsearch(hmms, sequences)):
logging.info("Finished annotation with HMM %s", hmm.name.decode())
# ... #
Hi, I have a question about callback in the hmmsearch function. I would update my progress after each query, but my code does not work as expected.
Maybe I do not understand how to use it.
I update it manually at the end of the for loop to make it work for the time, but I would also use this to write the name of the HMM in a debug (with the logging package). So, it seems a good idea to define a callback function.
Thanks for your help