awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.56k stars 749 forks source link

Document how to make reproducible runs with GluonTS #2853

Open StatMixedML opened 1 year ago

StatMixedML commented 1 year ago

Description

Hello, I am working with PyTorch-models in GluonTS and trying to make results reproducible. In particular, I would like to set a global seed to ensure that the batchify and evaluation steps (i.e., drawing samples from the predicted distribution) produce the same results each time I run the code. I have tried using "torch.manual_seed(123)" as suggested in some documentation, but it does not seem to have any effect.

Could you please help me to understand how I can set a global seed in PyTorch-GluonTS?

Thank you for your time and help.

lostella commented 1 year ago

@StatMixedML have you also set the numpy seed? Numpy is used to sample instances from training data to construct batches, so it plays a role as well

lostella commented 1 year ago

@StatMixedML you can also do that via pytorch lightning:

import pytorch_lightning as pl
pl.seed_everything(0)

Docstring:

Signature:
pl.seed_everything(
    seed: Union[int, NoneType] = None,
    workers: bool = False,
) -> int
Docstring:
Function that sets seed for pseudo-random number generators in: pytorch, numpy, python.random In addition,
sets the following environment variables:

- `PL_GLOBAL_SEED`: will be passed to spawned subprocesses (e.g. ddp_spawn backend).
- `PL_SEED_WORKERS`: (optional) is set to 1 if ``workers=True``.

Args:
    seed: the integer value seed for global random state in Lightning.
        If `None`, will read seed from `PL_GLOBAL_SEED` env variable
        or select it randomly.
    workers: if set to ``True``, will properly configure all dataloaders passed to the
        Trainer with a ``worker_init_fn``. If the user already provides such a function
        for their dataloaders, setting this argument will have no influence. See also:
        :func:`~lightning_fabric.utilities.seed.pl_worker_init_function`.
File:      ~/.pyenv/versions/3.8.13/lib/python3.8/site-packages/lightning_fabric/utilities/seed.py
Type:      function
StatMixedML commented 1 year ago

@lostella Thanks for your reply and the hint to pl.seed_everything. Using the following helps to reduce the variability of the results from run to run, but does not completely make the results reproducible:

torch.manual_seed(123)
np.random.seed(123)
pl.seed_everything(seed=123, workers=True)

Yet, it is good enough for now, so closing the issue.

Thanks again

lostella commented 1 year ago

I'll keep it open since it doesn't hurt to track it as something we should document