Closed Bard2803 closed 1 year ago
OK, so I am not sure if this should be called a bug. It works after implementing the generator_strategy. What I read in docs is that the VAE generator is applied by default, this didnt work for the set up I presented. I do not delete this in case some1 has similar issue.
Just add generator (do not base on the default VAE as in docs):
# model:
generator = MlpVAE((3, 32, 32), nhid=2, device=device)
# optimzer:
lr = 0.001
optimizer_generator = Adam(
filter(lambda p: p.requires_grad, generator.parameters()),
lr=lr,
weight_decay=0.0001,
)
# strategy (with plugin):
generator_strategy = VAETraining(
model=generator,
optimizer=optimizer_generator,
train_mb_size=100,
train_epochs=4,
eval_mb_size=100,
device=device,
plugins=[
GenerativeReplayPlugin(
replay_size=None,
increasing_replay_size=False,
)
],
)
# CREATE THE STRATEGY INSTANCE (GenerativeReplay)
cl_strategy = GenerativeReplay(
model,
optimizer,
criterion,
train_mb_size=20,
train_epochs=4,
eval_mb_size=20,
device=device,
evaluator=eval_plugin,
eval_every=1,
generator_strategy=generator_strategy)
🐛 Describe the bug The training on mps gpu goes well until it hit the last epoch for experience 0. The the following bug appears:
For CPU the error is slightly different
🐜 To Reproduce That is my code:
🐝 Expected behavior Should continue training
🦋 Additional context I tried decreasing the the batches and observed the memory consumption and it does not seem to have any connection with memory overhead.