Me and my colleague are using your package and encountered a strange phenomenon; we are getting negative loss values when training. That being said, the loss graph as a whole kinda looks like a normal loss graph (minus the values).
I am also pasting a code snippet of the initialization and the training:
class NormalizedFlowModel:
def __init__(self, n_flows, pretrained_path=None, device='cpu', **kwargs):
self.n_flows = n_flows
self.net = ResnetAdapter(pretrained_path, device)
self.latent_dim = kwargs.get('latent_dim', 512)
self.device = device
self.transform = transforms.CompositeTransform([
transforms.MaskedAffineAutoregressiveTransform(features=self.latent_dim, hidden_features=2 * self.latent_dim),
] * n_flows)
# Set target and q0
base_distribution = distributions.StandardNormal(shape=[self.latent_dim])
# Construct flow model
self.flow = flows.Flow(transform=self.transform, distribution=base_distribution)
def train(self, nf_train_loader, **kwargs):
n_epochs = kwargs.get('n_epochs', 5)
lr = kwargs.get('lr', 1e-4)
weight_decay = kwargs.get('weight_decay', 1e-5)
optimizer = torch.optim.Adam(self.flow.parameters(), lr=lr, weight_decay=weight_decay)
loss_list = []
for epoch in tqdm(range(n_epochs), desc="epoch"):
for batch_idx, (X, Y) in enumerate(nf_train_loader):
batch_size = X.shape[0]
X = X.to(self.device)
with torch.no_grad():
outputs, _, latent = self.net(X)
loss = -self.flow.log_prob(inputs=latent[-1]).mean()
return loss_list
We are using the feature map before the FC layer of resnet as inputs (that is latent[-1])
This behavior is expected. Probability densities can be greater (and less) than one, so log probability densities can be positive and negative. Therefore, negative log probability loss can be positive and negative.
Hi! Me and my colleague are using your package and encountered a strange phenomenon; we are getting negative loss values when training. That being said, the loss graph as a whole kinda looks like a normal loss graph (minus the values).
I am also pasting a code snippet of the initialization and the training:
We are using the feature map before the FC layer of resnet as inputs (that is