Zhangyanbo / diffusion-evolution

Diffusion model derived evolutionary algorithm
Apache License 2.0
143 stars 12 forks source link

Can't seem to replicate results #4

Closed RobertTLange closed 1 month ago

RobertTLange commented 1 month ago

Hi there,

I am trying to replicate some of the results to get a better feeling for scaling, etc. Using 512 popmembers for 2D tasks seems far too much.

And I can't get the method to work. Playing with a 2D rosenbrock (using -1 * fitness for minimization). I get the following diverging plot. Can you help out? image

I am running the following minimal extracted code:

import torch
from tqdm import tqdm
from torch.distributions import MultivariateNormal

class DDIMScheduler:
    def __init__(self, num_step, power=1, eps=1e-4):
        self.num_step = num_step
        self.power = power
        self.alpha = (
            torch.linspace(1 - eps, (eps * eps) ** (1 / self.power), num_step)
            ** self.power
        )
        self.index = 0

    def __next__(self):
        if self.index >= self.num_step - 1:
            raise StopIteration

        t = self.num_step - self.index - 1
        alpha = self.alpha[t]
        alpha_past = self.alpha[t - 1]
        self.index += 1
        return t, (alpha, alpha_past)

    def __len__(self):
        return self.num_step - 1

    def __iter__(self):
        self.index = 0
        return self

class BayesianEstimator:

    def __init__(
        self,
        x: torch.tensor,
        fitness: torch.tensor,
        alpha,
    ):
        self.x = x
        self.fitness = fitness
        self.alpha = alpha

    def density(self, x):
        return torch.ones(x.shape[0]) / x.shape[0]

    @staticmethod
    def norm(x):
        if x.shape[-1] == 1:
            # for some reason, torch.norm become very slow when dim=1, so we use torch.abs instead
            return torch.abs(x).squeeze(-1)
        else:
            return torch.norm(x, dim=-1)

    def gaussian_prob(self, x, mu, sigma):
        dist = self.norm(x - mu)
        return torch.exp(-(dist**2) / (2 * sigma**2))

    def _estimate(self, x_t, p_x_t):
        # diffusion proability, P = N(x_t; \sqrt{α_t}x,\sqrt{1-α_t})
        mu = self.x * (self.alpha**0.5)
        sigma = (1 - self.alpha) ** 0.5
        p_diffusion = self.gaussian_prob(x_t, mu, sigma)
        # estimate the origin
        prob = (self.fitness + 1e-9) * (p_diffusion + 1e-9) / (p_x_t + 1e-9)
        z = torch.sum(prob)
        origin = torch.sum(prob.unsqueeze(1) * self.x, dim=0) / (z + 1e-9)
        return origin

    def estimate(self, x_t):
        p_x_t = self.density(x_t)
        origin = torch.vmap(self._estimate, (0, 0))(x_t, p_x_t)
        return origin

    def __call__(self, x_t):
        return self.estimate(x_t)

def ddim_step(xt, x0, alphas: tuple, noise: float = None):
    alphat, alphatp = alphas
    sigma = ddpm_sigma(alphat, alphatp) * noise
    eps = (xt - (alphat**0.5) * x0) / (1.0 - alphat) ** 0.5
    if sigma is None:
        sigma = ddpm_sigma(alphat, alphatp)
    # print(sigma, alphat, alphatp)
    x_next = (
        (alphatp**0.5) * x0
        + ((1 - alphatp - sigma**2) ** 0.5) * eps
        + sigma * torch.randn_like(x0)
    )
    return x_next

def ddpm_sigma(alphat, alphatp):
    """Compute the default sigma for the DDPM algorithm."""
    return ((1 - alphatp) / (1 - alphat) * (1 - alphat / alphatp)) ** 0.5

class BayesianGenerator:
    """Bayesian Generator for the DDIM algorithm."""

    def __init__(self, x, fitness, alpha):
        self.x = x
        self.fitness = fitness
        self.alpha, self.alpha_past = alpha
        self.estimator = BayesianEstimator(self.x, self.fitness, self.alpha)

    def generate(self, noise=1.0):
        torch.manual_seed(0)
        x0_est = self.estimator(self.x)
        # print("Est", x0_est)
        x_next = ddim_step(self.x, x0_est, (self.alpha, self.alpha_past), noise=noise)
        return x_next

    def __call__(self, noise=1.0):
        return self.generate(noise=noise)

class Identity:
    """Identity fitness mapping function."""

    def __init__(self, l2_factor=0.0):
        self.l2_factor = l2_factor

    def l2(self, x):
        return torch.norm(x, dim=-1) ** 2

    def forward(self, x):
        return x

    def __call__(self, x):
        return self.forward(x) * torch.exp(-1.0 * self.l2(x) * self.l2_factor)

class DiffEvo:

    def __init__(
        self,
        x: torch.tensor,
        num_step: int = 100,
        noise: float = 1.0,
        scaling: float = 1,
    ):
        self.num_step = num_step
        self.scaling = scaling
        self.noise = noise
        self.fitness_mapping = Identity()
        self.scheduler = DDIMScheduler(self.num_step)
        self.x = x

    def ask(self):
        return self.x * self.scaling

    def tell(self, fitness):
        t, alpha = next(self.scheduler)
        # print("Alpha", alpha)
        generator = BayesianGenerator(self.x, self.fitness_mapping(fitness), alpha)
        self.x = generator(noise=self.noise)
        # print("Mean", self.x.mean())

def rosenbrock(x):
    # rosenbrock function
    return 100 * (x[:, 1] - x[:, 0] ** 2) ** 2 + (1 - x[:, 0]) ** 2

fitness_fn = rosenbrock

torch.manual_seed(0)
xT = torch.randn(16, 2)
num_generations = 30
optimizer = DiffEvo(noise=1.0, num_step=num_generations, x=xT)
for i in range(num_generations - 1):
    x = optimizer.ask()
    fitness = fitness_fn(x)
    optimizer.tell(- fitness)
    print("Fitness", fitness.min(), fitness.mean(), fitness.max())

# plot the fitness results with log
import matplotlib.pyplot as plt

# plot log of fitness
plt.plot(np.log(np.array(fitness_list).mean(axis=1)), label="Mean of Population")
plt.plot(np.log(np.array(fitness_list).min(axis=1)), label="Min of Population")
plt.legend()
plt.ylabel("Log Fitness")
plt.xlabel("Generation")
plt.title("DiffEvo on 2D Rosenbrock with 16 PopMembers")
plt.show()
RobertTLange commented 1 month ago

Furthermore, multiplying the fitness back with -1 doesn't change anything about the plot - this seems odd to me.

Zhangyanbo commented 1 month ago

Hi, I guess the problem comes from the negative fitness. As you can see, the default optimizer is design for simple usage, so the fitness mapping is Identity. However, our formulation is to map fitness to probability density, which is always positive number.

For this reason, I suggest you use the "typical usage" in the README.md, which exposes all the components. Then, you can choose a better mapping, such as exp(-f), or 1/(f+eps), which ensures they are positive.

RobertTLange commented 1 month ago

Thank you for the quick reply. Like I said using -fitness and just fitness gives the same results.

could you maybe share the settings you used for the rosenbrock task in the paper?

Zhangyanbo commented 1 month ago

You can find them in the experiments/benchmark/ folder. The traditional methods experiments may not work, since we haven't relase the CMA-ES/PEPG code yet (soon!), but you can find our experiment on Rosenbrock.

We basically re-define the fitness function, mapping desired fitness to higher density. See here: https://github.com/Zhangyanbo/diffusion-evolution/blob/360b002bb600787953809a50be6887b5687e689d/experiments/benchmarks/benchmarks.py#L91

Zhangyanbo commented 1 month ago

Also, I think the reason that -fitness has no difference is caused by the normalization. The new generation is weighted average of current population, i.e., $\sum w_i x_i / \sum w_i$, which is equal to $\sum -w_i x_i / \sum -w_i$ if you set $w_i=f_i$.

RobertTLange commented 1 month ago

Again, thank you for the response. I get that part.

I am actually looking for the noise_scale and mapping settings. So basically everything I need to reproduce the Rosenbrock results given the code above. From what you shared it looks as if all results use the default settings?

Zhangyanbo commented 1 month ago

Maybe the README is a bit confusing, but I'm not suggesting use the highly packed version. The optimization code of rosenbrock is here: https://github.com/Zhangyanbo/diffusion-evolution/blob/360b002bb600787953809a50be6887b5687e689d/experiments/benchmarks/diff_evo.py#L16-L45

So, you can see that we use get_obj to get a rescaled version of the function, where 1 means target, and 0 mean bad solutions. You can do this directly for fitness function, but you can also do this during the loop (if you use the "typical usage").

RobertTLange commented 1 month ago

Thank you, I got something somewhat working now. I think it would make a lot of sense to update the Readme/Defaults to something that is fairly robust across fitness landscapes.

Closing this.