bayesiains / nflows

Normalizing flows in PyTorch
MIT License
845 stars 118 forks source link

Is having negative loss okay? #76

Closed yarinbar closed 1 year ago

yarinbar commented 1 year ago

Hi! Me and my colleague are using your package and encountered a strange phenomenon; we are getting negative loss values when training. That being said, the loss graph as a whole kinda looks like a normal loss graph (minus the values).

image

I am also pasting a code snippet of the initialization and the training:

class NormalizedFlowModel:
    def __init__(self, n_flows, pretrained_path=None, device='cpu', **kwargs):
        self.n_flows = n_flows
        self.net = ResnetAdapter(pretrained_path, device)

        self.latent_dim = kwargs.get('latent_dim', 512)
        self.device = device

        self.transform = transforms.CompositeTransform([
            transforms.MaskedAffineAutoregressiveTransform(features=self.latent_dim, hidden_features=2 * self.latent_dim),
            transforms.RandomPermutation(features=self.latent_dim)
        ] * n_flows)

        # Set target and q0
        base_distribution = distributions.StandardNormal(shape=[self.latent_dim])

        # Construct flow model
        self.flow = flows.Flow(transform=self.transform, distribution=base_distribution)
        self.flow.to(device)

    def train(self, nf_train_loader, **kwargs):
        n_epochs = kwargs.get('n_epochs', 5)
        lr = kwargs.get('lr', 1e-4)
        weight_decay = kwargs.get('weight_decay', 1e-5)

        optimizer = torch.optim.Adam(self.flow.parameters(), lr=lr, weight_decay=weight_decay)
        loss_list = []

        for epoch in tqdm(range(n_epochs), desc="epoch"):

            self.flow.train()
            self.net.eval()

            for batch_idx, (X, Y) in enumerate(nf_train_loader):
                batch_size = X.shape[0]

                X = X.to(self.device)

                with torch.no_grad():
                    outputs, _, latent = self.net(X)

                loss = -self.flow.log_prob(inputs=latent[-1]).mean()
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                loss_list.append(loss.item())

        return loss_list

We are using the feature map before the FC layer of resnet as inputs (that is latent[-1])

imurray commented 1 year ago

This behavior is expected. Probability densities can be greater (and less) than one, so log probability densities can be positive and negative. Therefore, negative log probability loss can be positive and negative.