LarsKue / lightning-trainable

A default trainable module for pytorch lightning.
MIT License
10 stars 1 forks source link

Make toy models deterministic #27

Closed thelostscout closed 4 months ago

thelostscout commented 4 months ago

Maybe it would be a good idea to give users the option to make toy models deterministic. Unfortunately, I don't quite understand the way the distribution dataset works, but I suspect one would have to set a maximum size, sample before training and create a "fix" dataset. a) do you think this is necessary? b) do you think this is achievable?

LarsKue commented 4 months ago

The point of the distribution datasets is that they are infinite - i.e., never repeating. However, what you are describing can be achieved by something along the lines of the following:

from lightning import seed_everything
from torch.utils.data import TensorDataset

from lightning_trainable.datasets import WhateverDistributionDataset

seed_everything(0)
infinite_dataset = WhateverDistributionDataset(...)
# you can also sample in batches and save to disk first
samples = infinite_dataset.distribution.sample((sample_size,))
finite_dataset = TensorDataset(samples)