AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.3k stars 418 forks source link

Continuation of previous ValueError from albumentations #348

Open XieKaiwen opened 3 months ago

XieKaiwen commented 3 months ago

https://github.com/AILab-CVC/YOLO-World/issues/343#issue-2311998061 - Link to the previous issue i posted about the issue.

The recommended solution to me was to pip install albumentations. However I installed albumentations and this issue appeared again after I recreated my virtual environment, ran pip install -e . and installed the requirements in the basic requirements file

Here is the updated pip list:

addict                   2.4.0
albumentations           1.4.7
aliyun-python-sdk-core   2.15.1
aliyun-python-sdk-kms    2.16.3
annotated-types          0.7.0
certifi                  2024.2.2
cffi                     1.16.0
charset-normalizer       3.3.2
click                    8.1.7
colorama                 0.4.6
contourpy                1.2.1
crcmod                   1.7
cryptography             42.0.7
cycler                   0.12.1
defusedxml               0.7.1
filelock                 3.14.0
fonttools                4.51.0
fsspec                   2024.5.0
huggingface-hub          0.23.1
idna                     3.7
imageio                  2.34.1
importlib_metadata       7.1.0
Jinja2                   3.1.4
jmespath                 0.10.0
joblib                   1.4.2
kiwisolver               1.4.5
lazy_loader              0.4
Markdown                 3.6
markdown-it-py           3.0.0
MarkupSafe               2.1.5
matplotlib               3.9.0
mdurl                    0.1.2
mmcv                     2.1.0
mmcv-lite                2.2.0
mmdet                    3.3.0
mmengine                 0.10.4
mmyolo                   0.6.0
model-index              0.1.11
mpmath                   1.3.0
networkx                 3.3
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
opencv-python            4.9.0.80
opencv-python-headless   4.9.0.80
opendatalab              0.0.10
openmim                  0.3.9
openxlab                 0.1.0
ordered-set              4.1.0
oss2                     2.17.0
packaging                24.0
pandas                   2.2.2
pillow                   10.3.0
pip                      23.0.1
platformdirs             4.2.2
prettytable              3.10.0
pycocotools              2.0.7
pycparser                2.22
pycryptodome             3.20.0
pydantic                 2.7.1
pydantic_core            2.18.2
Pygments                 2.18.0
pyparsing                3.1.2
python-dateutil          2.9.0.post0
pytz                     2023.4
PyYAML                   6.0.1
regex                    2024.5.15
requests                 2.28.2
rich                     13.4.2
safetensors              0.4.3
scikit-image             0.23.2
scikit-learn             1.5.0
scipy                    1.13.1
setuptools               60.2.0
shapely                  2.0.4
six                      1.16.0
supervision              0.19.0
sympy                    1.12
tabulate                 0.9.0
termcolor                2.4.0
terminaltables           3.1.10
threadpoolctl            3.5.0
tifffile                 2024.5.22
timm                     0.6.13
tokenizers               0.19.1
tomli                    2.0.1
torch                    2.3.0
torchvision              0.18.0
tqdm                     4.65.2
transformers             4.41.1
triton                   2.3.0
typing_extensions        4.12.0
tzdata                   2024.1
urllib3                  1.26.18
wcwidth                  0.2.13
wheel                    0.43.0
yapf                     0.40.2
yolo_world               0.1.0       /home/jupyter/til-24-base/vlm/YOLO-World
zipp                     3.18.2
wondervictor commented 3 months ago

Hi @XieKaiwen, have you fixed the bug about "albumentations"?

XieKaiwen commented 3 months ago

@wondervictor previously i pip installed and it worked, the bug disappeared. However because i moved the folder and tried to replicate the virtual environment, this time with albumentations(as can be seen in pip list).

The files were still the same as the previous issue i posted. However the error came back, which is quite confusing

XieKaiwen commented 3 months ago

@wondervictor actually nevermind, I totally forgot about the fact that we need a specific version for albumentations for this.

But afterwards I was met with another problem

[rank0]:   File "/home/jupyter/til-24-base/vlm/YOLO-World/YOLOvenv/lib/python3.10/site-packages/mmdet/models/detectors/base.py", line 92, in forward
[rank0]:     return self.loss(inputs, data_samples)
[rank0]:   File "/home/jupyter/til-24-base/vlm/YOLO-World/yolo_world/models/detectors/yolo_world.py", line 30, in loss
[rank0]:     img_feats, txt_feats = self.extract_feat(batch_inputs,
[rank0]:   File "/home/jupyter/til-24-base/vlm/YOLO-World/yolo_world/models/detectors/yolo_world.py", line 100, in extract_feat
[rank0]:     img_feats = self.neck(img_feats, txt_feats)
[rank0]:   File "/home/jupyter/til-24-base/vlm/YOLO-World/YOLOvenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/jupyter/til-24-base/vlm/YOLO-World/YOLOvenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/jupyter/til-24-base/vlm/YOLO-World/yolo_world/models/necks/yolo_world_pafpn.py", line 213, in forward
[rank0]:     top_down_layer_inputs = torch.cat([upsample_feat, feat_low], 1)
[rank0]: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 56 but got size 55 for tensor number 1 in the list.

I dont know if this is caused by a problem in my dataset or something?

wondervictor commented 3 months ago

Hi @XieKaiwen, what is your input shape?

XieKaiwen commented 3 months ago

@wondervictor the images in my images file are all width 1520 and height is 870.

Also unrelated but for the MixedGroundingDataset, the bounding boxes are xyxy format right

wondervictor commented 3 months ago

The shape should be the multiple of 32 and you should pad it to a shape (1536, 896)

XieKaiwen commented 3 months ago

@wondervictor just to confirm, the MixedGrounding format, the bboxes should be xyxy format like in pascal_voc right?

wondervictor commented 3 months ago

be xywh and xy is the left-top corner.

XieKaiwen commented 3 months ago

@wondervictor so for training the bbox given to the model should be xywh (coco-format) and but when model predicts on data, it will output xyxy?

XieKaiwen commented 3 months ago

@wondervictor sorry but i have to confirm that because in the config files, i see alot of "xyxy" and "pascal_voc" being used as the bbox format for like albumentation and box loss, but the bbox format input in annotations for training using MixedGroundingDataset is in coco format?