Open ahron1 opened 1 week ago
@ahron1 thanks for suggesting this!
@atroyn what do you think here?
@ahron1 looking at the PR you opened - and reviewing the multimodal docs - I think that this functionality should be in a new loader class... something like ImageUrLoader
. I'm not seeing the case where someone would want to pass both remote and local filesystem requests and the magic of supporting both could lead to confusion/bugs.
Thanks for checking!
Indeed, it is better to split it off. With @tazarov 's idea about adding in authentication, it becomes a standalone remote image loader.
I agree that this ought to be standalone, though the counter-argument is that other data loaders, e.g. HuggingFace datasets
and similar support both kinds of loading. I think in our case however, we do want these to be separate, since they're not about downloading data, but getting data into Chroma.
HuggingFace was why I originally thought of handling both URIs and URLs together. But indeed, here it should be standalone. I just pushed the changes.
Describe the problem
Currently, the
ImageLoader
data loader loads only locally stored images.For loading remote images, one needs to write a wrapper to fetch the images first. It will be very helpful
ImageLoader
would also directly load remote images over HTTP/S.Describe the proposed solution
It is an easy fix. Just add an extra block to the
_load_image
definition inImageLoader
, to check if it is a remote or local URI. If it is a HTTP/S URI, use PIL and BytesIO to fetch the image before converting it to a NumPy array.I already have the code working locally. Happy to submit a PR if people are open to the idea.
Alternatives considered
No response
Importance
would make my life easier
Additional Information
No response