Closed Louis-Dupont closed 1 year ago
I don't understand this PR.
For those out of context (me), can you please elaborate on why we need to change current design? It is not clear whan this PR attempts to solve. If it's an enabler for other feature - ok, which one?
I'd love to also see some usage examples - where this new concept is intended for being used.
I don't understand this PR.
For those out of context (me), can you please elaborate on why we need to change current design? It is not clear whan this PR attempts to solve. If it's an enabler for other feature - ok, which one?
It is mainly an enabler, but it can almost be seen as a feature.
The target originally is to have a simple way to load a dataset in SG after running the analysis on DG. There are 2 main blocking issues with that
AnalysisManager
DataConfig
which holds many dataset attributes, is responsible for asking questions and saving to cache is instantiated The idea of this PR, is to create a class (DatasetAdapter
) that would group the DataConfig
and the Adapter/Processing
logic, which uses the previous.
This would also increase the code cohesion, and (slightly) lower the coupling with the rest of code.
This DatasetAdapter
will then be instantiable outside of the AnalysisManager
.
Someone ran DG like usually
analyzer = DetectionAnalysisManager(
report_title="Testing Data-Gradients demo",
cache_name="MyCustomDataset.json",
train_data=custom_train_dataset,
val_data=custom_val_dataset,
class_names=class_names,
).run()
Then, he can wrap his dataset and benefit from the cached values. The code below would run directly and output images/targets in our format (label_xyxy I think), with image in the right format as well.
train_data = DetectionDatasetAdapter(
data_iterable=custom_train_dataset,
cache_filename="MyCustomDataset.json",
)
for image, label_xyxy in train_data:
....
This is the target for SG - we would then wrap this in a DG dataset to include all the transform.
Someone can always use the dataset adapter completely independently of the AnalysisManager
train_data = DetectionDatasetAdapter(data_iterable=custom_dataset) # option to pass extra parameters to be asked less questions
for image, xyxy in train_data:
....
Then, on the first iteration, the user will be asked any question that is required to format the image/targets, similarly to what is done when running AnalysisManager
.
Not sure if this is useful, but it would still work
train_data = DetectionDatasetAdapter(
cache_name="MyCustomTrainSet.json",
data=custom_train_dataset,
class_names=class_names,
)
val_data = DetectionDatasetAdapter(
cache_name="MyCustomValSet.json",
data=custom_val_dataset,
class_names=class_names,
)
analyzer = DetectionAnalysisManager(
report_title="Testing Data-Gradients demo",
cache_name="MyCustomDataset.json",
train_data=train_data,
val_data=val_data,
class_names=class_names,
).run()
train_data = DetectionDatasetAdapter(data_iterable=custom_dataset)
for image, xyxy in train_data:
....
Update; just added src/data_gradients/sample_iterables/base.py
The motivation was to take out the get_iterator
which was defined in the DatasetAdapter and which returned an iterator of ImageSample objects, and instead have a class resonsible to do that.
This way, DatasetAdapter
has a more clear responsability and that responsaiblity is more clear.
Motivation
We want to group all the "adapter/processing" logic into a
DatasetAdapter
class.preocessing
,preprocessing
, ect ... from theAnalysisManager
. This will all be handled by theDatasetAdapter
.AnalysisManager
and independantly of it.Note There is still some coupling between this
DatasetAdapter
and theAnalysisManager
(no way to escape this) because we want to easily plug theDatasetAdapter
to theAnalysisManager
. This is handled with the methodsamples_iterator
that returns the samples. I am not 100% sure about this but could not find a better way (open for suggestion)