dgcnz / dl2

Code for "Effect of equivariance on training dynamics"
1 stars 0 forks source link

Port SmokePlume dataset generator to avoid test leakage #46

Closed dgcnz closed 1 month ago

dgcnz commented 1 month ago

Description

Current dataset contains 40 timesteps, 30 for training 10 for validation and the validation set is reused as test set. This can be problematic as early stopping relies on validation metrics and test leakage is occuring.

Although this task is important for best reproduction, we can use in the meantime the original data produced, thus this doesn't hold to a P0 task.

Tasks

You can base your script on jhtdb.generate but making a huggingface loading script is not required and possibly not advisable given the time constraints.

Resources

dgcnz commented 1 month ago

@Nesta-gitU I think that it would be useful to brief @MeneerTS on this dataset generation at some point, it would be useful that at least one of us understands what really represents the data (ref: https://github.com/dgcnz/dl2/issues/36#issuecomment-2105706045)

Maybe this helps in the meantime (https://www.youtube.com/watch?v=KMfcF9XvVio)

Nesta-gitU commented 1 month ago

I think its done