ContinualAI / avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
http://avalanche.continualai.org
MIT License
1.79k stars 291 forks source link

StreamConfusionMatrix Stalling Problem #513

Closed vlomonaco closed 3 years ago

vlomonaco commented 3 years ago

Describe the bug The script provided below stalls (or it is super slow) to move to the next experience.

To Reproduce

import torch
import torch.optim as optim
import torchvision.transforms as transforms
from avalanche.benchmarks.classic import CORe50
from avalanche.evaluation.metrics import ExperienceForgetting, accuracy_metrics, loss_metrics, timing_metrics, \
    cpu_usage_metrics, StreamConfusionMatrix, disk_usage_metrics
from avalanche.logging import InteractiveLogger, TextLogger, TensorboardLogger
from avalanche.training.plugins import EvaluationPlugin
from avalanche.training.strategies import Naive
from torchvision.models import resnet34

train_transform = transforms.Compose([
        transforms.ToTensor()
])
eval_transform = transforms.Compose([
    transforms.ToTensor()
])
scenario = CORe50(
    scenario="ni",
    train_transform=train_transform,
    eval_transform=eval_transform)
model = resnet34(pretrained=True)

tb_logger = TensorboardLogger(tb_log_dir="experiment")
interactive_logger = InteractiveLogger()

eval_plugin = EvaluationPlugin(
    accuracy_metrics(minibatch=True, epoch=True, epoch_running=True, experience=True, stream=True),
    loss_metrics(minibatch=True, epoch=True, epoch_running=True, experience=True, stream=True),
    StreamConfusionMatrix(num_classes=50, save_image=True), # <---- disable this to make the problem disappear
    loggers=[interactive_logger, tb_logger]
)
optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)
criterion = torch.nn.CrossEntropyLoss()
batch_size = 16
epochs = 1
device = torch.device("cuda")

cl_strategy = Naive(
        model=model,
        optimizer=optimizer,
        criterion=criterion,
        train_mb_size=batch_size,
        train_epochs=epochs,
        eval_mb_size=batch_size,
        evaluator=eval_plugin,
        device=device
)

print("Starting experiment...")
print(len(scenario.train_stream))

for experience in scenario.train_stream[:2]:
    print("Start of experience ", experience.current_experience)
    print("Current classes: ", experience.classes_in_this_experience)

    cl_strategy.train(experience, num_workers=8)
    print("Training complete.")

    print("Computing accuracy on the test set")
    cl_strategy.eval(scenario.test_stream, num_workers=8)
    print("End of experience ", experience.current_experience)

Expected behavior The script should go over the next experience immediately.

Additional context See discussion here: https://github.com/ContinualAI/avalanche/discussions/439#discussioncomment-585228

AndreaCossu commented 3 years ago

It seems the problem is related to the confusion matrix plotter. Setting save_image=False solves the problem.

I will try to change the default plotter, which is currently using scikit-learn, to something faster.