Cannot reproduce results

idontgetoutmuch commented 1 year ago

Just running nonparametric-hmc alone on the Gaussian Mixture Model gives

In the paper the correct value is given much more distinctly

Maybe it's a different version of Torch?

idontgetoutmuch commented 1 year ago

FWIW

>>> print(torch.__version__)
1.13.0

idontgetoutmuch commented 1 year ago

I tried a different set of seeds and got

I will try running the chains for longer.

fzaiser commented 1 year ago

Thanks for the report. I hope I'll have time to look into this in a week or two. In the meantime, you could check out commit 9b7f65c8e311df81fdb2f1a45a0051d42c037c0b and use torch-1.6.0 (because this is what was used to generate the plots in the paper). Other versions used are also listed in the README.

idontgetoutmuch commented 1 year ago

@fzaiser no rush - I will keep experimenting and try to see if I can use the exact versions you have given.

idontgetoutmuch commented 1 year ago

dom@Wandle nonparametric-hmc % pip install torch==1.6.0
pip install torch==1.6.0
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
ERROR: Could not find a version that satisfies the requirement torch==1.6.0 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1)
ERROR: No matching distribution found for torch==1.6.0

idontgetoutmuch commented 1 year ago

I am not sure how you are using 1.6.0

fzaiser commented 1 year ago

I was using 1.6.0 at the time back in 2020 when it was the most recent version. I haven't tried using this version recently. Your findings in #3 are quite surprising, especially this here:

Log posterior predictive densities:
True LPPD:  -674.81 +- nan (standard deviation)
hmc:  -731.08 +- 0.00 (standard deviation)
is:  -725.98 +- 0.00 (standard deviation)

Here NP-HMC seems to be doing worse than IS, which is unexpected. This probably means that something is going wrong with the NP-HMC code in newer pytorch versions. I'll continue to investigate.

fzaiser commented 1 year ago

I ran the GMM experiment again (with the most up-to-date version of the dependencies) and I get the following result:

Log posterior predictive densities:
True LPPD:  -674.81 +- nan (standard deviation)
hmc:  -681.16 +- 17.59 (standard deviation)
is:  -728.79 +- 9.98 (standard deviation)

So NP-HMC is doing much better than in your runs, but somehow worse than what is reported in the paper and when I ran the experiments two years ago. I'm not sure why this is happening, but since we didn't change our code since then, one of the dependencies must have changed in some way that our code doesn't expect.

@idontgetoutmuch Another approach to reproduce our results would be to recreate the setup I used two years ago more precisely, e.g. running the experiments in a Docker container on Ubuntu 20.04, installing exactly the versions of the dependencies that I used etc. ~If Docker is too heavy for you, Python's virtualenv might already be enough in this case, but this takes care only of the Python package versions, not the rest of the system.~ I tried virtualenv and installing torch-1.6.0 but it led to errors with C dependencies. I think the whole system needs to be older for this to work.

fzaiser commented 1 year ago

I also ran the other experiments from the paper again with the new dependency versions, and they show very similar results to what we reported in the paper. It is very strange that this particular experiment (the GMM) yields such weird results now.

@idontgetoutmuch Did you run into similar problems with the other three experiments (example_geometric.py, example_walk.py and example_dirichlet.py)? These work fine for me and I just want to confirm that there are no other issues.

fzaiser commented 1 year ago

I ran the GMM experiment again (with the most up-to-date version of the dependencies) and I get the following result:
Log posterior predictive densities:
True LPPD:  -674.81 +- nan (standard deviation)
hmc:  -681.16 +- 17.59 (standard deviation)
is:  -728.79 +- 9.98 (standard deviation)
So NP-HMC is doing much better than in your runs, but somehow worse than what is reported in the paper and when I ran the experiments two years ago.

I've looked into this further and it seems that the bad LPPD stems from one unlucky run. The LPPDs for the ten runs are: [-675.7832641602, -731.2267456055, -675.5260009766, -675.4831542969, -675.5700683594, -675.8757324219, -675.6724853516, -675.4584350586, -675.6270751953, -675.3497314453].

My current hypothesis is that this has to do with the random number generation in the new versions of torch, and the second run now gets a very "unlucky" random seed where HMC gets stuck in a bad local optimum. The different seeds would also explain why your histogram looks different from what we got 2 years ago. It would also explain why the other experiments are unaffected.

fzaiser commented 1 year ago

Indeed, if you add 2 to the random seeds in example_gmm.py to avoid the unlucky run, i.e. change the lines seed=rep to seed=rep + 2, then the LPPD becomes this: hmc: -675.82 +- 0.81 (standard deviation), which is slightly worse than what I got two years ago, but at least on par with LMH and better than the other methods. The histogram looks like this after the change (which looks similar to your histogram at the top of this thread): This histogram places less mass on the correct number of mixture components (9) and more on the wrong numbers (14, 15, and 16), which seems similar to what you observed. I believe that the random initialization simply has a bigger effect than we assumed, which causes this behavior.

This is an interesting observation, thank you for bringing it to my attention! Unfortunately, we didn't notice this dependence on the random initialization at the time – even though we ran the experiments with ten different random seeds.

I still think the main results of the paper are reproducible even with newer versions of the dependencies.

idontgetoutmuch commented 1 year ago

@idontgetoutmuch Another approach to reproduce our results would be to recreate the setup I used two years ago more precisely, e.g. running the experiments in a Docker container on Ubuntu 20.04, installing exactly the versions of the dependencies that I used etc. ~If Docker is too heavy for you, Python's virtualenv might already be enough in this case, but this takes care only of the Python package versions, not the rest of the system.~ I tried virtualenv and installing torch-1.6.0 but it led to errors with C dependencies. I think the whole system needs to be older for this to work.

I think I can try this using nixpkgs rather than docker. It's on my todo list.

idontgetoutmuch commented 1 year ago

I also ran the other experiments from the paper again with the new dependency versions, and they show very similar results to what we reported in the paper. It is very strange that this particular experiment (the GMM) yields such weird results now.

@idontgetoutmuch Did you run into similar problems with the other three experiments (example_geometric.py, example_walk.py and example_dirichlet.py)? These work fine for me and I just want to confirm that there are no other issues.

I haven't tried them but I will put them on my todo list.

idontgetoutmuch commented 1 year ago

I still think the main results of the paper are reproducible even with newer versions of the dependencies.

I agree the results ought to be reproducible and will try to do so. Something else for my todo list ;-)

idontgetoutmuch commented 1 year ago

Well I tried a shell.nix with


let
  p38Packages = nixpkgs.python38Packages;

  numpi = p38Packages.numpy.overridePythonAttrs (old: {
    src = nixpkgs.fetchurl {
      url =
        "https://files.pythonhosted.org/packages/2c/2f/7b4d0b639a42636362827e611cfeba67975ec875ae036dd846d459d52652/numpy-1.19.1.zip";
      sha256 = "uEVph7Y3IyYCzrTWY8s0EG9+t4DiR9UaJguEdg/Y9JE=";
    };
  });

  torci = p38Packages.pytorch.overridePythonAttrs (old: {
    src = nixpkgs.fetchurl {
      url ="https://files.pythonhosted.org/packages/8e/18/93b190226d09958be96919fd50c55d28f83f1a1b9260a2b33499f9d86728/torch_geometric-1.6.0.tar.gz";
      sha256 = "fbf43fe15421c9affc4fb361ba4db55cb9d3c64d0c29576bb58d332bf6d27fef";
    };
  });

in nixpkgs.mkShell {

  buildInputs = with nixpkgs; [ (python38.withPackages (ps: [ numpi torci ])) ];
}

and then

nix shell --system x86_64-darwin nixpkgs#nix -c nix-shell shell.nix

which sadly gave

nix shell --system x86_64-darwin nixpkgs#nix -c nix-shell shell2.nix
these 2 derivations will be built:
  /nix/store/8h2pcn3lj7xk5rzxwvgmd4hjzr1azgab-python3.8-pytorch-1.6.0.drv
  /nix/store/sn7bi5fabwhsdy3zi8khd1gnil0q6050-python3-3.8.8-env.drv
building '/nix/store/8h2pcn3lj7xk5rzxwvgmd4hjzr1azgab-python3.8-pytorch-1.6.0.drv'...
Sourcing python-recompile-bytecode-hook.sh
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing setuptools-build-hook
Using setuptoolsBuildPhase
Using setuptoolsShellHook
Sourcing pip-install-hook
Using pipInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
unpacking sources
unpacking source archive /nix/store/kaif0y9sn04z7izip1i2bf5cihfj9yz2-torch_geometric-1.6.0.tar.gz
source root is torch_geometric-1.6.0
setting SOURCE_DATE_EPOCH to timestamp 1594103286 of file torch_geometric-1.6.0/setup.cfg
patching sources
applying patch /nix/store/v9sbzzdfnjxx67xn7lk0ij1qw6g1phfc-pthreadpool-disable-gcd.diff
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/third_party/pthreadpool/CMakeLists.txt b/third_party/pthreadpool/CMakeLists.txt
|index 0db3264..1ba91c4 100644
|--- a/third_party/pthreadpool/CMakeLists.txt
|+++ b/third_party/pthreadpool/CMakeLists.txt
--------------------------
File to patch: 
Skip this patch? [y] 
Skipping patch.
2 out of 2 hunks ignored
can't find file to patch at input line 31
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/third_party/pthreadpool/src/threadpool-common.h b/third_party/pthreadpool/src/threadpool-common.h
|index ca84744..244d0ca 100644
|--- a/third_party/pthreadpool/src/threadpool-common.h
|+++ b/third_party/pthreadpool/src/threadpool-common.h
--------------------------
File to patch: 
Skip this patch? [y] 
Skipping patch.
1 out of 1 hunk ignored
error: builder for '/nix/store/8h2pcn3lj7xk5rzxwvgmd4hjzr1azgab-python3.8-pytorch-1.6.0.drv' failed with exit code 1;
       last 10 log lines:
       > --------------------------
       > |diff --git a/third_party/pthreadpool/src/threadpool-common.h b/third_party/pthreadpool/src/threadpool-common.h
       > |index ca84744..244d0ca 100644
       > |--- a/third_party/pthreadpool/src/threadpool-common.h
       > |+++ b/third_party/pthreadpool/src/threadpool-common.h
       > --------------------------
       > File to patch:
       > Skip this patch? [y]
       > Skipping patch.
       > 1 out of 1 hunk ignored
       For full logs, run 'nix log /nix/store/8h2pcn3lj7xk5rzxwvgmd4hjzr1azgab-python3.8-pytorch-1.6.0.drv'.
error: 1 dependencies of derivation '/nix/store/sn7bi5fabwhsdy3zi8khd1gnil0q6050-python3-3.8.8-env.drv' failed to build

I believe that the nixpkgs I am using had some patches for pytorch, but the version of pytorch I am trying to build isn't compatible with the nixpkgs patch - sigh.

fzaiser commented 1 year ago

I don't know anything about nix, so I'm afraid I can't help there.

Is there a specific reason you want to reproduce the exact results (including the randomization behavior of the dependencies from 2 years ago)? Especially given that I can reproduce the main results using up-to-date versions of the dependencies too (up to small discrepancies described here: https://github.com/fzaiser/nonparametric-hmc/issues/4#issuecomment-1429172440). Trying to reproduce the exact values from the paper seems like a lot of effort for little benefit to me.

(I generally agree that being able to reproduce exact values is beneficial. For future projects, I will provide a Docker container that allows one to reproduce everything exactly, but unfortunately I don't have the time to do the same for this project after the fact.)

idontgetoutmuch commented 1 year ago

I don't know anything about nix, so I'm afraid I can't help there.

I wasn't expecting you to help with nix. I just wanted to record what had been done.

Is there a specific reason you want to reproduce the exact results (including the randomization behavior of the dependencies from 2 years ago)? Especially given that I can reproduce the main results using up-to-date versions of the dependencies too (up to small discrepancies described here: #4 (comment)). Trying to reproduce the exact values from the paper seems like a lot of effort for little benefit to me.

My misunderstanding. I thought from what you had written (below for reference), you were suggesting I should do this. I am very happy to not do this.

@idontgetoutmuch Another approach to reproduce our results would be to recreate the setup I used two years ago more precisely, e.g. running the experiments in a Docker container on Ubuntu 20.04, installing exactly the versions of the dependencies that I used etc. ~If Docker is too heavy for you, Python's virtualenv might already be enough in this case, but this takes care only of the Python package versions, not the rest of the system.~ I tried virtualenv and installing torch-1.6.0 but it led to errors with C dependencies. I think the whole system needs to be older for this to work.

(I generally agree that being able to reproduce exact values is beneficial. For future projects, I will provide a Docker container that allows one to reproduce everything exactly, but unfortunately I don't have the time to do the same for this project after the fact.)

Not a problem and thank you for your time.

I think all that remains for me to do is to try reproducing the results from example_geometric.py, example_walk.py and example_dirichlet.py and then I can close the ticket.

idontgetoutmuch commented 1 year ago

By eye, example_walk.py looks very similar to the original:

idontgetoutmuch commented 1 year ago

I am not going to run the other examples just now and if I have to (I hope I won't) will open a new ticket.

fzaiser commented 4 months ago

@mariacraciun solved the mystery! Apparently, the old version of torch did not check whether arguments to Poisson.log_prob are integers, just evaluating the probability as though it were an integer. Newer versions of torch do check whether the argument is an integer and throw an exception if they aren't. This is a problem for NP-DHMC because values during sampling may not be integers, but we'd still like to evaluate the log_prob as though they were. I'll think about how best to fix this.

fzaiser commented 4 months ago

The problem seems to have been introduced here: https://github.com/pytorch/pytorch/pull/5358. Possible fix could be to pass validate_args=False to the distribution initializer.

fzaiser / nonparametric-hmc

Cannot reproduce results #4