autodistill / autodistill-grounding-dino

Grounding DINO module for use with Autodistill.
Apache License 2.0
16 stars 11 forks source link

Unable to process .pgm or grayscale images #4

Open Mars-204 opened 7 months ago

Mars-204 commented 7 months ago

Current model is unable to process the .pgm images. While the actual grounding DINO model can process these type of images(Tested on the colab notebook

capjamesg commented 7 months ago

Hello @Mars-204! Thank you for creating this Issue! Autodistill implements its own image loading logic. This is because we want to standardize how we process images across all models. We didn't account for pgm files in our implementation.

You should be able to load an image with PIL, convert it as necessary, then pass the PIL object through directly to autodistill-grounding-dino:

image ='test.pgm')

image = image.convert('L')


Let me know if this code helps!

Mars-204 commented 7 months ago

Thanks for the information. I am using autodistill-grounding-dino for creating annotated dataset. How should I proceed if I want to pass the whole folder of images as input?

For eg:

base_model_dino.label(input_folder=folder_name, output_folder=save_dir, extension=".pgm")

capjamesg commented 7 months ago

My solution above will not work with .label(). Good catch.

I have submitted a PR that will load .pgm images from a file and convert them into a PIL object. In the case of Grounding DINO, the image is then converted into a cv2 object for use in inference.

Would you be interested in testing the implmentation?

You can install the fix using:

pip uninstall autodistill
pip install git+

Then, run your code.

Mars-204 commented 7 months ago

Thanks for the updtae. But still it is unable to process the pgm images with the fix.

(test) PS C:\work\masterarbiet\annotator_test> python .\
trying to load grounding dino directly
torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3527.)
final text_encoder_type: bert-base-uncased
Labeling ./context_images\sv2_6041_00000005_inten.pgm:   0%|                                                                                                                                      | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\work\masterarbiet\annotator_test\", line 9, in <module>
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill\detection\", line 68, in label
    detections = self.predict(f_path)
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill_grounding_dino\", line 39, in predict
    image = load_image(input, return_format="cv2")
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill\", line 82, in load_image
    return cv2.cvtColor(np.array(, cv2.COLOR_RGB2BGR)
cv2.error: OpenCV(4.8.0) d:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<1,-1,-1>,struct cv::impl::A0xb9d9ffe2::Set<3,4,-1>,struct cv::impl::A0xb9d9ffe2::Set<0,2,5>,3>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'
> Unsupported depth of input image:
>     'VDepth::contains(depth)'
> where
>     'depth' is 4 (CV_32S)

I also tried changing ''image = load_image(input, return_format="PIL")'' but got the error:

(test) PS C:\work\masterarbiet\annotator_test> python .\
trying to load grounding dino directly
torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3527.)
final text_encoder_type: bert-base-uncased
Labeling ./context_images\sv2_6041_00000005_inten.pgm:   0%|                                                                                                                                      | 0/5 [00:00<?, ?it/s]
  File "C:\work\masterarbiet\annotator_test\", line 9, in <module>
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill\detection\", line 68, in label
    detections = self.predict(f_path)
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill_grounding_dino\", line 44, in predict
    detections = self.grounding_dino_model.predict_with_classes(
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\groundingdino\util\", line 193, in predict_with_classes
    processed_image = Model.preprocess_image(image_bgr=image).to(self.device)
  File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\groundingdino\util\", line 220, in preprocess_image
    image_pillow = Image.fromarray(cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB))
cv2.error: OpenCV(4.8.0) :-1: error: (-5:Bad argument) in function 'cvtColor'
> Overload resolution failed:
>  - src is not a numpy array, neither a scalar
>  - Expected Ptr<cv::UMat> for argument 'src'