Probabilistic Programming Languages (PPL)

rezaBarzgar commented 7 months ago

I'm going to log my activities regarding probabilistic programming languages on this issue page. This will lead to a new BNN model using a more powerful Python library, i.e. Pyro. I'll also put some useful links here:

https://pyro.ai/examples/
https://youtu.be/tw0cSm7TElE?si=6get0ucfuV0mAN2y

Agenda
[x] Variatinal Inference in BNNs
[x] Finding Probabilistic libraries for Python (Pyro)
[x] What are pyro.Sample, pyro.dist, pyro.param, pyro.plate ?
[x] Implementing a Bayesian regression classifier
[x] Implement a simple BNN.
[ ] Stochastic Variational Inference (SVI) {in progress}
[ ] SVI in Pyro {in progress}
all of the learning process happens in svi.step(model, guide,model, args, *kwargs)
[ ] Leaerning how ELBO works {in progress}
By debugging codes, I have found that in practice, the loss value is calculated by subtracting the sum of weights and biases of all nodes of guide from the sum of weights and biases of all nodes of the bnn model. The interesting part is that it ignores the prediction of the model.
[ ] Pair Programming with Dr. Fani
[ ] Implement it on OpeNTF
[ ] Find a way to add CL (needs a unique Idea but can be a great research milestone)

rezaBarzgar commented 7 months ago

code snippet of a simple BNN using pyro

I highly recommend to set a debbug point on loss = svi.step(x_train, y_train) and step into the svi.step to observe the calculation of the loss value and the learning procedure

import time
import matplotlib as plt
import json
import matplotlib.pyplot as plt
import numpy as np
import re
import pyro

import torch
from torch import optim
from torch.utils.data import DataLoader
from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.functional import leaky_relu
from torch.distributions import Normal

import pyro
import pyro.distributions as dist
from pyro.nn import PyroModule, PyroSample
import torch.nn as nn
from pyro.infer import MCMC, NUTS
from pyro.infer import Predictive
from pyro.infer import SVI, Trace_ELBO
from pyro.infer.autoguide import AutoDiagonalNormal
from tqdm.auto import trange

class BNN(PyroModule):
    def __init__(self, in_dim=1, out_dim=1, hid_dim=10, n_hid_layers=5, prior_scale=5.):
        super().__init__()

        self.activation = nn.Tanh()  # could also be ReLU or LeakyReLU
        assert in_dim > 0 and out_dim > 0 and hid_dim > 0 and n_hid_layers > 0  # make sure the dimensions are valid

        # Define the layer sizes and the PyroModule layer list
        self.layer_sizes = [in_dim] + n_hid_layers * [hid_dim] + [out_dim]
        layer_list = [PyroModule[nn.Linear](self.layer_sizes[idx - 1], self.layer_sizes[idx]) for idx in
                      range(1, len(self.layer_sizes))]
        self.layers = PyroModule[torch.nn.ModuleList](layer_list)

        for layer_idx, layer in enumerate(self.layers):
            layer.weight = PyroSample(dist.Normal(0., prior_scale * np.sqrt(2 / self.layer_sizes[layer_idx])).expand(
                [self.layer_sizes[layer_idx + 1], self.layer_sizes[layer_idx]]).to_event(2))
            layer.bias = PyroSample(dist.Normal(0., prior_scale).expand([self.layer_sizes[layer_idx + 1]]).to_event(1))

    def forward(self, x, y=None):
        x = x.reshape(-1, 1)
        x = self.activation(self.layers[0](x))  # input --> hidden
        for layer in self.layers[1:-1]:
            x = self.activation(layer(x))  # hidden --> hidden
        mu = self.layers[-1](x).squeeze()  # hidden --> output
        sigma = pyro.sample("sigma", dist.Gamma(.5, 1))  # infer the response noise

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu

if __name__ == '__main__':

    # Set random seed for reproducibility
    np.random.seed(42)

    # Generate data
    x_obs = np.hstack([np.linspace(-0.2, 0.2, 500), np.linspace(0.6, 1, 500)])
    noise = 0.02 * np.random.randn(x_obs.shape[0])
    y_obs = x_obs + 0.3 * np.sin(2 * np.pi * (x_obs + noise)) + 0.3 * np.sin(4 * np.pi * (x_obs + noise)) + noise

    x_true = np.linspace(-0.5, 1.5, 1000)
    y_true = x_true + 0.3 * np.sin(2 * np.pi * x_true) + 0.3 * np.sin(4 * np.pi * x_true)

    # Convert data to PyTorch tensors
    x_train = torch.from_numpy(x_obs).float()
    y_train = torch.from_numpy(y_obs).float()
    xlims = [-0.5, 1.5]
    ylims = [-1.5, 2.5]

    pyro.clear_param_store()

    model = BNN(hid_dim=10, n_hid_layers=1, prior_scale=5.)
    mean_field_guide = AutoDiagonalNormal(model)
    optimizer = pyro.optim.Adam({"lr": 0.01})

    svi = SVI(model, mean_field_guide, optimizer, loss=Trace_ELBO())
    pyro.clear_param_store()

    num_epochs = 250
    progress_bar = trange(num_epochs)

    for epoch in progress_bar:
        loss = svi.step(x_train, y_train)
        progress_bar.set_postfix(loss=f"{loss / x_train.shape[0]:.3f}")

    predictive = Predictive(model, guide=mean_field_guide, num_samples=500)
    x_test = torch.linspace(xlims[0], xlims[1], 3000)
    preds = predictive(x_test)

rezaBarzgar commented 7 months ago

the following snippets are part of a larger piece of code related to probabilistic programming and variational inference in Pyro. The code is implementing the calculation of the Evidence Lower Bound (ELBO) and the surrogate ELBO for a given model and guide.

Here's a breakdown:

Loop over model nodes:
```
for name, site in model_trace.nodes.items():
   if site["type"] == "sample":
       # Accumulate log probabilities for each sample site in the model
       elbo_particle = elbo_particle + torch_item(site["log_prob_sum"])
       surrogate_elbo_particle = surrogate_elbo_particle + site["log_prob_sum"]
```
This loop iterates over the nodes in the trace (model_trace) of the probabilistic model. For each node, if it corresponds to a sample statement in the model (i.e., generating a random variable), it accumulates the log probability of that sample (log_prob_sum) to both elbo_particle and surrogate_elbo_particle.

Loop over guide nodes:

for name, site in guide_trace.nodes.items():
   if site["type"] == "sample":
       # Accumulate log probabilities for each sample site in the guide
       elbo_particle = elbo_particle - torch_item(site["log_prob_sum"])

       # If entropy term is non-zero, subtract it from surrogate ELBO
       if not is_identically_zero(entropy_term):
           surrogate_elbo_particle = surrogate_elbo_particle - entropy_term.sum()

       # If score function term is non-zero, add it to surrogate ELBO
       if not is_identically_zero(score_function_term):
           if log_r is None:
               log_r = _compute_log_r(model_trace, guide_trace)
           site = log_r.sum_to(site["cond_indep_stack"])
           surrogate_elbo_particle = surrogate_elbo_particle + (site * score_function_term).sum()

This loop is similar to the previous one but iterates over the nodes in the trace (guide_trace) of the guide (variational distribution). It adjusts the elbo_particle and surrogate_elbo_particle based on the log probabilities, entropy terms, and score function terms associated with each sample site in the guide.

Return the negative ELBO and surrogate ELBO:
```
return -elbo_particle, -surrogate_elbo_particle
```
The final result is the negative of the accumulated ELBO and surrogate ELBO for the given model and guide. This form is common in variational inference, where the goal is often to maximize the ELBO, but optimization algorithms typically minimize the objective function.

fani-lab / OpeNTF

Probabilistic Programming Languages (PPL) #219

https://youtu.be/tw0cSm7TElE?si=6get0ucfuV0mAN2y

Agenda

code snippet of a simple BNN using pyro