Imageomics / bioclip

This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].
https://imageomics.github.io/bioclip/
Other
166 stars 14 forks source link

How to solve the slow inference problem with multiple categories? #20

Closed ccc524 closed 4 months ago

ccc524 commented 4 months ago

Thank you very much for open-sourcing your model for us to learn from. I am facing a small issue and would appreciate your help. I want to infer a specific category name from a batch of images (there are about 6000 categories), and I only need the most confident one. However, the process is too slow because it seems to compute all 6000 categories. How can I resolve this? 2

Thank you!

hlapp commented 4 months ago

Hi - if you only want to use the model for inference (rather than fine-tuning, further training, or something similar), have you tried using pybioclip? It might be easier to install/deploy/use.

That said, pybioclip (as the name suggests) doesn't really do anything wildly different from the model inference code here. If I understand you correctly, you want to supply 6000 categories of your own and have it predict from among those? The part that takes substantially more time than the default "open-ended classification" (which uses the classes we trained on, for which the embeddings are already computed) is, in my understanding, embedding the custom classes, especially if you don't have a GPU. There's no way around that.

However, this step should have to be done only once for the batch. I'm not sure whether pybioclip actually does this indeed only once (cc @johnbradley), or how the code here behaves.

johnbradley commented 4 months ago

@hlapp pybioclip does not cache the category embeddings, it has the same logic as https://huggingface.co/spaces/imageomics/bioclip-demo. It would be simple to start caching them.

One other thing to note is that bioclip generates 80 embeddings for each category. So for 6000 categories you end up with 480000 text embeddings. bioclip has templates that it applies to each category: https://github.com/Imageomics/pybioclip/blob/d6ebb1d2e8516a420387b08cf8f2017c9056a852/src/bioclip/predict.py#L22-L102

hlapp commented 4 months ago

pybioclip does not cache the category embeddings

@johnbradley am I right that by not caching you mean the category embeddings would be recomputed between one invocation of the program (bioclip as provided by pybioclip) and the next, but would not need to be recomputed between predicting one image and the next within the same invocation? I.e., for a given batch of images, it's only computed once in pybioclip, right?

johnbradley commented 4 months ago

Text embeddings are calculated for each image image processed. This is really inefficient, but pre-existing logic.

johnbradley commented 4 months ago

I created https://github.com/Imageomics/pybioclip/issues/15 to address the caching problem in pybioclip.

johnbradley commented 4 months ago

@ccc524 pybioclip now has changes to cache the text embeddings. The changes apply to both the command line tool and the python API. Given you are using 6000 categories the python API may be easier to work with.

In my testing using a GPU sped up creating the text embeddings. So assuming you have a cuda graphics card you could run something similar to this:

from bioclip import CustomLabelsClassifier

classifier = CustomLabelsClassifier(["duck","fish","bear"], device="cuda")
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
   print(prediction["classification"], prediction["score"])

Please see the updated documentation: https://github.com/Imageomics/pybioclip?tab=readme-ov-file#predict-from-a-list-of-classes

hlapp commented 4 months ago

Thanks @johnbradley. Perhaps we can close this issue now? @samuelstevens @thompsonmj any votes?

thompsonmj commented 4 months ago

Yes, I believe this addresses this issue adequately. To be clear for @ccc524, we expect with pybioclip #16 that the first iteration of the image batch to still be slow for instantiating all the text embeddings (though faster using a CUDA device). However, subsequent iterations over images in the session's batch will be faster due to the text embedding cache.

I might suggest as a further improvement for a new issue: the text embeddings cache to be optionally saved to disk and an option to load these from a specified filepath rather than recomputing if the category list is reused.

samuelstevens commented 4 months ago

The demo code is expected to be demo code only, and I had originally expected users to read and modify the code as necessary for particular use cases. But it seems to be a common issue that no-one is modifying the demo code, simply using as is.

Saving embeddings to disk is an example of writing application-specific code, and while I am happy to make an example of this, I don't think this should be the default demo code.

hlapp commented 4 months ago

Thanks @samuelstevens, seems like you agree this is out of scope here, especially now that it's available in pybioclip. Thanks @ccc524 for bringing this to our attention and thereby prompting the change in pybioclip!