IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
https://arxiv.org/abs/2401.14159
Apache License 2.0
791 stars 61 forks source link

run florence2 error #55

Open yuntao229 opened 2 days ago

yuntao229 commented 2 days ago

I got two questions as following when I ran the demo "grounded_sam2_florence2_image_demo.py":

  1. Traceback (most recent call last): File "/opt/conda/envs/compression/lib/python3.10/site-packages/transformers/feature_extraction_utils.py", line 194, in convert_to_tensors tensor = as_tensor(value) File "/opt/conda/envs/compression/lib/python3.10/site-packages/transformers/feature_extraction_utils.py", line 150, in as_tensor return torch.from_numpy(value) TypeError: expected np.ndarray (got numpy.ndarray)

  2. During handling of the above exception, another exception occurred: File "/root/.cache/huggingface/modules/transformers_modules/Florence-2-large/processing_florence2.py", line 250, in call pixel_values = self.image_processor( File "/code/Grounded-SAM-2/grounded_sam2_florence2_image_demo.py", line 78, in run_florence2 inputs = processor(text=prompt, images=image, padding=True, return_tensors="pt").to(device, torch.float16) File "/code/Grounded-SAM-2/grounded_sam2_florence2_image_demo.py", line 344, in phrase_grounding_and_segmentation results = run_florence2(task_prompt, text_input, florence2_model, florence2_processor, image) File "/code/Grounded-SAM-2/grounded_sam2_florence2_image_demo.py", line 633, in phrase_grounding_and_segmentation( ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

rentainhe commented 2 days ago

Hello, what's your transformer version, we used transformers==4.43.3 in our env for running grounded sam 2

yuntao229 commented 4 hours ago

@rentainhe Thank you for your reply. I replace the transformers with the 4.43.3 version one. I have another trouble with installing flash_attn, which version do you employ with flash_attn?