AstraZeneca / KAZU

Fast, world class biomedical NER
https://AstraZeneca.github.io/KAZU/
Apache License 2.0
76 stars 8 forks source link

Monitoring progress #15

Closed GuyAglionby closed 11 months ago

GuyAglionby commented 11 months ago

Thanks for this useful library! I wonder if there's a recommendation for how to track progress of the pipeline, and also check progress through each stage? It'd be useful to know how long I should expect to wait for it to complete. I'm using the default pipeline.

Thanks in advance for any pointers

EFord36 commented 11 months ago

Hi,

Thanks for reaching out!

There's not anything in the docs currently unfortunately, we've had some docs on batch processing in progress for a while but unfortunately not complete yet.

Are you sending a large number of documents/large documents to the kazu pipeline? If you're sending a small number of documents, it should only be a few seconds.

If you're sending a large number of documents, one option is to 'batch'your documents to kazu in a for loop, and wrap it with tqdm which will give you a progress bar of how many documents/batches you've run through the pipeline.

You could do the batching with e.g. the chunked recipe from more_itertools.

Let me know if any of that is unclear.

Is that good enough for what you're after, or are you running kazu in a way that each step is taking a long time for even a small number of documents?

GuyAglionby commented 11 months ago

Thanks for the quick response! I'm sending ~50k docs, but they're not very big. I've batched them into reasonably small chunks before feeding them to the pipeline as you suggest, and that's working nicely.

Thanks again

EFord36 commented 11 months ago

glad I could help!

I was curious (possibly nosy) about how kazu might be being used and had an explore of your GitHub profile and website. Looks like you're doing interesting stuff!

Would be happy/interested to know how well kazu suits your needs/what shortcomings it currently has for your usage, to see if we can fix/improve them. So feel free to open more issues, or even reach out directly if it's less an 'issue'/more ambiguous - you can see my work email in the kazu git history, e.g. looking at this git commit: https://github.com/AstraZeneca/KAZU/commit/ccb99ddbcf63633be31f8c977b658aff25ef38c8.patch