The training section can be found below, it's roughly the same with original code but with accumulated loss. The batch_size means the target batch instead of the real batch size we used for data loader. Out of Memory of GPU occurs with it.
for i, (shower_data,incident_energies) in enumerate(shower_loader_train,0):
# Move model to device and set dtype as same as data (note torch.double works on both CPU and GPU)
model.to(device, shower_data.dtype)
model.train()
shower_data = shower_data.to(device)
incident_energies = incident_energies.to(device)
if len(shower_data) < 1:
print('Very few hits in shower: ', len(shower_data))
continue
# Zero any gradients from previous steps
optimiser.zero_grad()
# Loss average for each batch
loss = score_model.loss_fn(model, shower_data, incident_energies, marginal_prob_std_fn, padding_value, device=device)
# Accumulate batch loss per epoch
cumulative_epoch_loss+=float(loss)
print(len(shower_data))
batch_loss += loss
batch_accumulate += len(shower_data)
print(i, batch_accumulate, torch.cuda.memory_allocated(device))
if batch_accumulate >= batch_size:
# collect dL/dx for any parameters (x) which have requires_grad = True via: x.grad += dL/dx
batch_loss.backward()
batch_loss = 0
batch_accumulate = 0
# Update value of x += -lr * x.grad
optimiser.step()
torch.cuda.empty_cache()
torch.no_grad()
The training section can be found below, it's roughly the same with original code but with accumulated loss. The
batch_size
means the target batch instead of the real batch size we used for data loader. Out of Memory of GPU occurs with it.