Open Mars-204 opened 7 months ago
Hello @Mars-204! Thank you for creating this Issue! Autodistill implements its own image loading logic. This is because we want to standardize how we process images across all models. We didn't account for pgm
files in our implementation.
You should be able to load an image with PIL, convert it as necessary, then pass the PIL object through directly to autodistill-grounding-dino
:
image = Image.open('test.pgm')
image = image.convert('L')
model.predict(image)
Let me know if this code helps!
Thanks for the information. I am using autodistill-grounding-dino for creating annotated dataset. How should I proceed if I want to pass the whole folder of images as input?
For eg:
base_model_dino.label(input_folder=folder_name, output_folder=save_dir, extension=".pgm")
My solution above will not work with .label()
. Good catch.
I have submitted a PR that will load .pgm
images from a file and convert them into a PIL object. In the case of Grounding DINO, the image is then converted into a cv2
object for use in inference.
Would you be interested in testing the implmentation?
You can install the fix using:
pip uninstall autodistill
pip install git+https://github.com/autodistill/autodistill.git@add-pgm-support
Then, run your code.
Thanks for the updtae. But still it is unable to process the pgm images with the fix.
(test) PS C:\work\masterarbiet\annotator_test> python .\test.py
trying to load grounding dino directly
torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3527.)
final text_encoder_type: bert-base-uncased
Labeling ./context_images\sv2_6041_00000005_inten.pgm: 0%| | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\work\masterarbiet\annotator_test\test.py", line 9, in <module>
base_model_dino.label(input_folder="./context_images",
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill\detection\detection_base_model.py", line 68, in label
detections = self.predict(f_path)
^^^^^^^^^^^^^^^^^^^^
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill_grounding_dino\grounding_dino_model.py", line 39, in predict
image = load_image(input, return_format="cv2")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill\helpers.py", line 82, in load_image
return cv2.cvtColor(np.array(Image.open(image)), cv2.COLOR_RGB2BGR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cv2.error: OpenCV(4.8.0) d:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<1,-1,-1>,struct cv::impl::A0xb9d9ffe2::Set<3,4,-1>,struct cv::impl::A0xb9d9ffe2::Set<0,2,5>,3>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'
> Unsupported depth of input image:
> 'VDepth::contains(depth)'
> where
> 'depth' is 4 (CV_32S)
I also tried changing ''image = load_image(input, return_format="PIL")'' but got the error:
(test) PS C:\work\masterarbiet\annotator_test> python .\test.py
trying to load grounding dino directly
torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3527.)
final text_encoder_type: bert-base-uncased
Labeling ./context_images\sv2_6041_00000005_inten.pgm: 0%| | 0/5 [00:00<?, ?it/s]
File "C:\work\masterarbiet\annotator_test\test.py", line 9, in <module>
base_model_dino.label(input_folder="./context_images",
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill\detection\detection_base_model.py", line 68, in label
detections = self.predict(f_path)
^^^^^^^^^^^^^^^^^^^^
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\autodistill_grounding_dino\grounding_dino_model.py", line 44, in predict
detections = self.grounding_dino_model.predict_with_classes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\groundingdino\util\inference.py", line 193, in predict_with_classes
processed_image = Model.preprocess_image(image_bgr=image).to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\work\masterarbiet\annotator_test\test\Lib\site-packages\groundingdino\util\inference.py", line 220, in preprocess_image
image_pillow = Image.fromarray(cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cv2.error: OpenCV(4.8.0) :-1: error: (-5:Bad argument) in function 'cvtColor'
> Overload resolution failed:
> - src is not a numpy array, neither a scalar
> - Expected Ptr<cv::UMat> for argument 'src'
Current model is unable to process the .pgm images. While the actual grounding DINO model can process these type of images(Tested on the colab notebook https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-grounding-dino.ipynb#scrollTo=VKzXm8mNR2XS)