aidos-lab / pytorch-topological

A topological machine learning framework based on PyTorch
https://pytorch-topological.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
142 stars 18 forks source link

Replicate the experiments of spheres from TopoAE #38

Open nzilberstein opened 4 days ago

nzilberstein commented 4 days ago

Hi,

Thanks for the great work and developing this library. I am trying to replicate the experiments from the Topological autoencoder paper using the script given as an example, but I can't. I used the hyperparameters from best run file in the original repo, and the DeepAE, but I still can't replicate. Is there something that I am missing?

Thanks!

Pseudomanifold commented 4 days ago

Hey there!

This is a new implementation, so maybe there are some slight differences. Can you let me know what exactly doesn't work?

nzilberstein commented 3 days ago

Hi Bastian,

Thanks for the quick response. I attach the results and the code I am using

 n_spheres = 11
  r = 5
  d = 100

  dataset_train = Spheres(n_spheres=n_spheres, 
                          r = r, 
                          d = d)

 train_loader = DataLoader(
    dataset_train,
    batch_size=28,
    shuffle=True,
    drop_last=True
)

n_epochs = 100
lam = 0.43357738971723536
optimizer_lr = 0.000272683866289072
batch_size = 28

  model = MLPAutoencoder_Spheres() #This one I took from the original code of TopoAE
  topo_model = TopologicalAutoencoder(model, lam=lam)

  optimizer = optim.Adam(topo_model.parameters(), lr=optimizer_lr, weight_decay=1e-5)

  for i in tqdm(range(n_epochs)):
      topo_model.train()

      for batch, (x, y) in enumerate(train_loader):
          loss = topo_model(x)

          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

And what is weird is that I get something like this for the validation dataset

Screenshot 2024-06-27 at 9 13 44 AM

I am tracking the loss, and it is decreasing very slow. Might be a problem with the architecture of the MLP? Becuase I tried with easier datasets (like circle in 2D), and in those cases it works somethimes with hidden_dim = 32, but I needed a higher hidden dimension in the MLP to have some consistency; with 32 in the hidden dim, sometimes I got

Screenshot 2024-06-27 at 9 24 27 AM

I tried also with the LinearAE, but the problem still remains.

nzilberstein commented 3 days ago

I think that something that is missing in the code above is the normalization of the topological loss by the batch size. I added that, and it works better, but still, I am not able to reproduce the same results.

Pseudomanifold commented 2 hours ago

Interesting! Can you check what happens with a slightly different optimiser (as in the example code? There are also some minor differences in the way we normalise features (see here for the original data set). These influence the choice of learning rate quite a lot.