autodistill / autodistill

Images to inference with no labeling (use foundation models to train supervised models).
https://docs.autodistill.com
Apache License 2.0
1.98k stars 158 forks source link

Autodistill doesn't appear to put result image files in correct directory #148

Open gerrat opened 7 months ago

gerrat commented 7 months ago

Search before asking

Bug

In my data directory, I have three subdirectories: test, train, and validation.
In each of these subdirectories, there are 20 .jpeg files. Trying to run autodistill on the data/train directory produces this:

(venv) user@host:/project/labels$ autodistill data/train --base="grounded_sam" --target="yolov8" --ontology '{"box": "case", "label": "label"}' --output="./results" Loading base model... WARNING: CUDA not available. GroundingDINO will run very slowly. trying to load grounding dino directly UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.) final text_encoder_type: bert-base-uncased Labeling data... Labeling images: 0it [00:00, ?it/s] Labeled dataset created - ready for distillation. Loading target model... Training target model... New https://pypi.org/project/ultralytics/8.1.43 available 😃 Update with 'pip install -U ultralytics' Ultralytics YOLOv8.0.81 🚀 Python-3.12.1 torch-2.2.2+cu121 CPU yolo/engine/trainer: task=detect, mode=train, model=yolov8n.pt, data=./results/data.yaml, epochs=200, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=cpu, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=runs/detect/train10 Overriding model.yaml nc=80 with nc=2

               from  n    params  module                                       arguments

0 -1 1 464 ultralytics.nn.modules.Conv [3, 16, 3, 2] 1 -1 1 4672 ultralytics.nn.modules.Conv [16, 32, 3, 2] 2 -1 1 7360 ultralytics.nn.modules.C2f [32, 32, 1, True] 3 -1 1 18560 ultralytics.nn.modules.Conv [32, 64, 3, 2] 4 -1 2 49664 ultralytics.nn.modules.C2f [64, 64, 2, True] 5 -1 1 73984 ultralytics.nn.modules.Conv [64, 128, 3, 2] 6 -1 2 197632 ultralytics.nn.modules.C2f [128, 128, 2, True] 7 -1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2] 8 -1 1 460288 ultralytics.nn.modules.C2f [256, 256, 1, True] 9 -1 1 164608 ultralytics.nn.modules.SPPF [256, 256, 5] 10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1] 12 -1 1 148224 ultralytics.nn.modules.C2f [384, 128, 1] 13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1] 15 -1 1 37248 ultralytics.nn.modules.C2f [192, 64, 1] 16 -1 1 36992 ultralytics.nn.modules.Conv [64, 64, 3, 2] 17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1] 18 -1 1 123648 ultralytics.nn.modules.C2f [192, 128, 1] 19 -1 1 147712 ultralytics.nn.modules.Conv [128, 128, 3, 2] 20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1] 21 -1 1 493056 ultralytics.nn.modules.C2f [384, 256, 1] 22 [15, 18, 21] 1 751702 ultralytics.nn.modules.Detect [2, [64, 128, 256]] Model summary: 225 layers, 3011238 parameters, 3011222 gradients, 8.2 GFLOPs

Transferred 319/355 items from pretrained weights optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias Traceback (most recent call last): File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/data/base.py", line 110, in get_img_files assert im_files, f'{self.prefix}No images found' AssertionError: train: No images found

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/project/labels/venv/bin/autodistill", line 8, in sys.exit(main()) ^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/autodistill/cli.py", line 227, in main target_model.train(dataset_yaml=os.path.join(output, "data.yaml"), epochs=epochs) File "/project/labels/venv/lib/python3.12/site-packages/autodistill_yolov8/yolov8.py", line 43, in train self.yolo.train(data=dataset_yaml, epochs=epochs, device=device) File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/engine/model.py", line 371, in train self.trainer.train() File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/engine/trainer.py", line 191, in train self._do_train(world_size) File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/engine/trainer.py", line 268, in _do_train self._setup_train(world_size) File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/engine/trainer.py", line 250, in _setup_train self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode='train') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/v8/detect/train.py", line 43, in get_dataloader build_dataloader(self.args, batch_size, img_path=dataset_path, stride=gs, rank=rank, mode=mode, File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/data/build.py", line 75, in build_dataloader dataset = YOLODataset( ^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/data/dataset.py", line 66, in init super().init(img_path, imgsz, cache, augment, hyp, prefix, rect, batch_size, stride, pad, single_cls, File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/data/base.py", line 67, in init self.im_files = self.get_img_files(self.img_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/project/labels/venv/lib/python3.12/site-packages/ultralytics/yolo/data/base.py", line 112, in get_img_files raise FileNotFoundError(f'{self.prefix}Error loading data from {img_path}\n{HELP_URL}') from e FileNotFoundError: train: Error loading data from /project/labels/results/train/images See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data (venv) user@host:/project/labels$

autodistill didn't create any files in the directory it is looking in with this traceback (/project/labels/results/train/images). It did however copy all the .jpeg files from my data/test directory into a results/images directory. (I must have manually did this, trying to get things to work) It did create results/train, and results/valid directories, and images, and labels subdirectories within both of these, but they're empty.

Environment

Minimal Reproducible Example

autodistill data/train --base="grounded_sam" --target="yolov8" --ontology '{"box": "case", "label": "label"}' --output="./results"

Additional

Running this through Python, I get a similar error, in that it ends up looking for images in results/train/images, but that folder is empty

Update: Not sure what happened before, but running directly via a python script worked this time. Running autodistill from bash prompt still fails.

Are you willing to submit a PR?

pwernette commented 4 days ago

Having the exact same issues, only it fails to work with either Python script or directly from bash. The created dataset folder has the correct structure but all its sub-folders are empty. Copying the original images to the dataset/images folder doesn't work either.

pwernette commented 3 days ago

After doing a bit more testing, it looks like autodistill will not read and/or copy PNG files. I had to go through the process of converting PNG to JPG and it appears to be working now. This is actually a significant issue, as JPG suffer from data distortions due to file compression.