Lightning-Universe / lightning-flash

Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
https://lightning-flash.readthedocs.io
Apache License 2.0
1.74k stars 212 forks source link

from_datamodules and dataset flexibility #135

Closed aribornstein closed 3 years ago

aribornstein commented 3 years ago

🚀 Feature

With the current datapipeline, If I want to customize my data is no easy way for me to provide my own data module and take advantage of flashs existing capabilities for validation splits and default transforms.

Motivation

While theoretically you can provide any loss function to a ImageClassificationModule in practice any non mutinomial loss such as binary cross entropy causes Flash to crash.

Ideally it should be easy to override this, but the way the flash create_from_folders abstracts hard codes dataset creation prevents me from being able to easily override the underlying filepath_dataset and folder_dataset classes meaning that if I want to do this myself I need to create my own datamodule.

If I create my own datamodule I lose all the flash features that I get using the from_folders and from_filepaths methods such as the ability to apply default transforms, split my train and validation data and any other future capabilities we may add leading to increased boilerplate .

Pitch

One way to make this better would be to have a from datamodule feature in the datapipeline though I think this only papers over the core issue. The core issue comes from hardcoding the underlying dataset class in these functions without providing any mechanism to override them.

I'm not sure the right way to make this change to flash without potentially breaking things or causing a conflict with the current flash refractor.

Alternatives

Additional context

edgarriba commented 3 years ago

related to #141