Customizable data pipeline for object detection

reactivetype commented 3 years ago

🚀 Feature

I would like to have a flexible interface to customize dataset and data pipeline for object detection

Motivation

Thanks for creating this fantastic library. For research or application, I want to use different datasets other than CustomCOCODataset. There are two possible scenarios:

using datasets readily available in different format (e.g. YOLO) without converting the format from YOLO to COCO. Here, I assume my model knows how to read and infer the labels (e.g. xyxy, xywh) and build targets from the dataset label.
I want to apply some multi-image data augmentation such as Mixup or mosaic augmentation to create new training image from the combination of multiple images from the dataset.

Is it possible to do any of these two scenarios? Can I swap the CustomCOCODataset with my custom LightningDataModule? Do we need to customize ObjectDetectionDataPipeline? I am not sure what the task pipeline is for. Some guideline would be appreciated. Thanks.

kaushikb11 commented 3 years ago

Hi, @reactivetype! Yes, that sounds great. As you can see currently, the flow for OD is like this:

datamodule = ObjectDetectionData.from_coco(
    train_folder="data/coco128/images/train2017/",
    train_ann_file="data/coco128/annotations/instances_train2017.json",
    batch_size=5
)

model = ObjectDetector(num_classes=datamodule.num_classes)

We could add support for more datasets by adding class methods to the ObjectDetectionData class. For eg., ObjectDetectionData.from_yolo(..), ObjectDetectionData.from_voc(..), etc.

Yes, you could pass transformations functions to the train_transform argument in ObjecDetectionData.from_coco.

The purpose of the DataPipeline is to provide the flow for the transformation of data using hooks. So, depending on your data requirements, you could tweak it by creating a Subclass of it.

But right now, we are doing a refactor on DataPipeline #141. Hence, the behavior could change but would be a better experience for the User! :)

edgarriba commented 3 years ago

@reactivetype DataPipeline is already merged. Please, check if that suits your use case. On the other hand, we are refactoring the data modules to make it more flexible and user friendly in front of custom data structures. Take a look at #256

edenlightning commented 3 years ago

Please feel free to reopen if needed!

Lightning-Universe / lightning-flash

Customizable data pipeline for object detection #159

🚀 Feature

Motivation