With the current datapipeline, If I want to customize my data is no easy way for me to provide my own data module and take advantage of flashs existing capabilities for validation splits and default transforms.
Motivation
While theoretically you can provide any loss function to a ImageClassificationModule in practice any non mutinomial loss such as binary cross entropy causes Flash to crash.
Ideally it should be easy to override this, but the way the flash create_from_folders abstracts hard codes dataset creation prevents me from being able to easily override the underlying filepath_dataset and folder_dataset classes meaning that if I want to do this myself I need to create my own datamodule.
If I create my own datamodule I lose all the flash features that I get using the from_folders and from_filepaths methods such as the ability to apply default transforms, split my train and validation data and any other future capabilities we may add leading to increased boilerplate .
Pitch
One way to make this better would be to have a from datamodule feature in the datapipeline though I think this only papers over the core issue. The core issue comes from hardcoding the underlying dataset class in these functions without providing any mechanism to override them.
I'm not sure the right way to make this change to flash without potentially breaking things or causing a conflict with the current flash refractor.
🚀 Feature
With the current datapipeline, If I want to customize my data is no easy way for me to provide my own data module and take advantage of flashs existing capabilities for validation splits and default transforms.
Motivation
While theoretically you can provide any loss function to a ImageClassificationModule in practice any non mutinomial loss such as binary cross entropy causes Flash to crash.
Ideally it should be easy to override this, but the way the flash create_from_folders abstracts hard codes dataset creation prevents me from being able to easily override the underlying filepath_dataset and folder_dataset classes meaning that if I want to do this myself I need to create my own datamodule.
If I create my own datamodule I lose all the flash features that I get using the from_folders and from_filepaths methods such as the ability to apply default transforms, split my train and validation data and any other future capabilities we may add leading to increased boilerplate .
Pitch
One way to make this better would be to have a from datamodule feature in the datapipeline though I think this only papers over the core issue. The core issue comes from hardcoding the underlying dataset class in these functions without providing any mechanism to override them.
I'm not sure the right way to make this change to flash without potentially breaking things or causing a conflict with the current flash refractor.
Alternatives
Additional context