ContinualAI / avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
http://avalanche.continualai.org
MIT License
1.78k stars 291 forks source link

Lack of clarity over training time taken to run Replay strategy vs Naive strategy. #1007

Closed KevinG1002 closed 2 years ago

KevinG1002 commented 2 years ago

Hello,

First of all, thank you for your fantastic work. Avalanche is a pleasure to work with.

I am currently working on a basic Replay strategy implementation for a project at University. During my exploration, however, I came across a curious phenomenon when training over experiences in a class incremental scenario (i.e previously unseen classes appear in each new experience) using MNIST. I was hoping you could explain what is happening. To provide clarity, I'll describe what I observed and also attach pictures for support.

To start off, here's a bit of context about my project setting:

Experiment Configuration

My main training loop looks as follows:

for experience in CI_MNIST_Scenario.train_stream:
       print("Start of experience: ", experience.current_experience)
       print("Current Classes: ", experience.classes_in_this_experience)

        # train returns a dictionary which contains all the metric values
        res = cl_strategy.train(experience)
        print("Training completed")

print("Computing accuracy on the whole test set")
# test also returns a dictionary which contains all the metric values 
results.append(cl_strategy.eval(CI_MNIST_Scenario.test_stream))

Observations when implementing a Naive EWC Strategy

When running the basic training loop found above, I find that training time over each experience is relatively consistent. For a relatively equal amount of samples per experience, it roughly takes ~3s to train over an epoch. No problem here, I expect this behaviour.

image

Observations when running Replay Strategy (buffer size of 100)

It is when running the Replay strategy using the same training loop that I observe unexpected behaviour. Consider the following screenshot that displays the results of a few iterations of my training loop.

image

Here's the thing I don't understand: why does the training time over each experience increase exponentially? As far as I am aware, each experience still holds roughly the same number of samples to be learned (~10k). The only modification regarding training data in our setting is the introduction of a buffer (of size 100). In my view, compared to Naive EWC, the increase in training time using a buffer can only be marginal.

Here's a breakdown of training times using this strategy: Exp_0: 3s per epoch Exp_1: 8s per epoch Exp_2: 18s per epoch Exp_3: 23s per epoch Exp_4: 45s per epoch

Any indications/explanations would be much appreciated! I am relatively new to Continual Learning, so apologies if I'm missing something which is obvious.

Again, thank you very much for your work.

Best,

Kevin

HamedHemati commented 2 years ago

Hi @KevinG1002 Are you using the beta version or the master branch to run your experiments?

KevinG1002 commented 2 years ago

Hi, thanks for getting back to me. Are you using the beta version or the master branch to run your experiments? I suppose that yes: I just ran the "pip install avalanche-lib" command and from there ran my experiments. My package version is 0.1.0

Do you recommend I clone & install the library on its master branch?

HamedHemati commented 2 years ago

I guess the problem happens because of the way the replay plugin manages the data loaders for the current data and buffer data. The plugin has changed since the beta release, so could you please give it a try with the latest version of the code, and see if the same thing happens?

make sure you first uninstall avalanche-lib

pip uninstall avalanche-lib

and then install it from the current commit as below:

pip install git+https://github.com/ContinualAI/avalanche.git@da4645b0856f1ff80a07a54ee7bc5d51e05068bb
KevinG1002 commented 2 years ago

Hi @HamedHemati,

I attempted what you recommended and can confirm that it works: training time is now consistent across experiences when using a Replay strategy.

Cheers!

ghost commented 2 years ago

Hi, @KevinG1002, @HamedHemati, I'm also installing the latest version with this line of code you shared. I can install it successfully, but when running the code this silly error raises. I can't find how the strategies module is now named.

ModuleNotFoundError: No module named 'avalanche.training.strategies'

HamedHemati commented 2 years ago

Hi @PabloMese In the new code structure, supervised strategies can be accessed via: avalanche.training.supervised

Please note that since the code is consistently changing, some existing examples/codes may not work with the master branch. We are going to have a new release with new examples and API documentation soon! :)