Reorganization of steps and TransformationDataset

lrzpellegrini commented 3 years ago

Hello everyone, following #293 and #313, I propose the following changes to both steps and the Transformation Dataset:

1) Rename StepInfo to Experience. 2) In Experience, support multiple task labels by switching from task_label: int to task_labels: List[int], which would contain the task labels contained in the experiences. For this field Set[int] may be more appropriate from a formal point of view, but i prefer List because the user can then obtain the first (usually only) task label using .task_labels[0], which is very simple. We can still retain the task_label: int field, which would be a shortcut for task_labels[0]. 3) Rename TransformationDataset to either MultiTaskDataset or AvalancheDataset. I prefer the latter as MultiTaskDataset seems too tied with the idea of "task", which is a term subject to different interpretations in CL. Also consider that this dataset will probably contain all the goodies we are planning to add in the future. 4) Allow the TransformationDataset to return the task label for each pattern when iterating over it (via the __getitem__ method). 5) In TransformationDataset, add a field similar to targets containing the task labels for each pattern. I think we shouldn't name it task_labels, as it would have the same name of the Experience field despite the different contents and meaning. 6) Create a field in TransformationDataset to allow the user to obtain, given a task label, the related subset. 7) Add the + operation for the TransformationDataset, which should return a concatenated dataset. 8) Add the + operation for Experiences, which should return a concatenated Experience (where task_labels and the dataset are properly merged). 9) Finally, add a concat() method to Experience slices, which should act exactly as the + operator when applied to multiple Experiences.

For future expansions: 10) Add a field to TransformationDataset or Experience to contain custom "task descriptors" (which are different from task labels, which are simple ints). 11) Add some way to add a validation dataset, which I feel it shouldn't be implemented as a scenario stream. I was thinking about adding a field to Experience.

I think that 1, 2, 3, 4, 5, 6 should be implemented ASAP. Points 7, 8, 9 are just very useful goodies, so we can implement them after making avalanche public. Points 10 and 11 may be good in order to support more complex scenarios (including meta-learning ones)! Those last two points will definitely require some brainstorming.

AntonioCarta commented 3 years ago

Agree on basically everything.

We can still retain the task_label: int field, which would be a shortcut for task_labels[0].

Can we make it fail if len(task_labels) > 0?

11) Add some way to add a validation dataset, which I feel it shouldn't be implemented as a scenario stream. I was thinking about adding a field to Experience.

Isn't it easier to have separate streams? Training and validation streams should have different names (for logging) and may also have other different properties (i.e. set of task labels). (11) is always important, not just for meta-learning. How do you do model selection otherwise?

lrzpellegrini commented 3 years ago

Can we make it fail if len(task_labels) > 0?

Yes, absolutely! (did you mean len(task_labels) > 1?)

With a separate stream we need to adapt the train method of the strategy to accept an optional validation Experience. It's ok if this is easily doable! However, I recommend waiting for points 1-9 to be completed before moving forward.

vlomonaco commented 3 years ago

For 5, what about targets_task_labels or similar?

lrzpellegrini commented 3 years ago

For 5, what about targets_task_labels or similar?

Seems good to me! Maybe a little too verbose, but at least the users will know what's its content.

lrzpellegrini commented 3 years ago

Seems fine, I'm going with it!

vlomonaco commented 3 years ago

Closing this issue thanks to https://github.com/ContinualAI/avalanche/commit/6a1da61561496944150da4c3aec1c91995a8d47a 8,9 wont' be implemented for now. I'll open a new issue for 10. For 11 you can refer to #262.

ContinualAI / avalanche

Reorganization of steps and TransformationDataset #320