Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.56k stars 496 forks source link

Feature Request: configure 'dataset_params' for the training/validation/test data in multiple directories. #1030

Closed PraveenKumar-Rajendran closed 8 months ago

PraveenKumar-Rajendran commented 1 year ago

Thank you for the awesome work! :)

Is your feature request related to a problem? Please describe.

The train/val/test split is not always stored in a single directory. It would be nice to give multiple directories and their corresponding labels for a single split ( ex. train )

Describe the solution you'd like

For example in YoloV5/V8, one can give multiple directories for the single split in the .yaml file.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/GlobalWheat2020  # dataset root dir
train: # train images (relative to 'path') 3422 images
  - images/arvalis_1
  - images/arvalis_2
  - images/arvalis_3
  - images/ethz_1
  - images/rres_1
  - images/inrae_1
  - images/usask_1
val: # val images (relative to 'path') 748 images (WARNING: train set contains ethz_1)
  - images/ethz_1
test: # test images (optional) 1276 images
  - images/utokyo_1
  - images/utokyo_2
  - images/nau_1
  - images/uq_1

Additional context

YoloV5/V8 assumes that the labels directory is in the same directory as the images.

images/
labels/

If this assumption is not used in YoloNAS how about just using the order in the list?

Example:

dataset_params = {
    'data_dir':'/data/od',
    'train_images_dir':['a/train/images', 'b/train/images', 'c/train/images', 'd/train/images'],
    'train_labels_dir':['a/train/labels', 'b/train/labels', 'c/train/labels', 'd/train/labels'],
    'val_images_dir':['a/val/images', 'b/val/images', 'c/val/images', 'd/val/images'],
    'val_labels_dir':['a/val/labels', 'b/val/labels', 'c/val/labels', 'd/val/labels'],
    'test_images_dir':'test/images',
    'test_labels_dir':'test/labels',
    'classes': ['apple', 'orange', 'grapes', 'mango', 'banana']
}
dagshub[bot] commented 1 year ago

Join the discussion on DagsHub!

itachi176 commented 1 year ago

Thank you for the awesome work! :)

Is your feature request related to a problem? Please describe.

The train/val/test split is not always stored in a single directory. It would be nice to give multiple directories and their corresponding labels for a single split ( ex. train )

Describe the solution you'd like

For example in YoloV5/V8, one can give multiple directories for the single split in the .yaml file.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/GlobalWheat2020  # dataset root dir
train: # train images (relative to 'path') 3422 images
  - images/arvalis_1
  - images/arvalis_2
  - images/arvalis_3
  - images/ethz_1
  - images/rres_1
  - images/inrae_1
  - images/usask_1
val: # val images (relative to 'path') 748 images (WARNING: train set contains ethz_1)
  - images/ethz_1
test: # test images (optional) 1276 images
  - images/utokyo_1
  - images/utokyo_2
  - images/nau_1
  - images/uq_1

Additional context

YoloV5/V8 assumes that the labels directory is in the same directory as the images.

images/
labels/

If this assumption is not used in YoloNAS how about just using the order in the list?

Example:

dataset_params = {
    'data_dir':'/data/od',
    'train_images_dir':['a/train/images', 'b/train/images', 'c/train/images', 'd/train/images'],
    'train_labels_dir':['a/train/labels', 'b/train/labels', 'c/train/labels', 'd/train/labels'],
    'val_images_dir':['a/val/images', 'b/val/images', 'c/val/images', 'd/val/images'],
    'val_labels_dir':['a/val/labels', 'b/val/labels', 'c/val/labels', 'd/val/labels'],
    'test_images_dir':'test/images',
    'test_labels_dir':'test/labels',
    'classes': ['apple', 'orange', 'grapes', 'mango', 'banana']
}

have you solved problem? I have same your problem.

BloodAxe commented 1 year ago

We do not support this scenario out of the box. I think what you will have to do is combine create multiple datasets and concat then using ConcatDataset from pytorch and then pass it to DataLoader.