ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add nan handling strategy for run_vizier.py #193

Open klei22 opened 2 months ago

klei22 commented 2 months ago

Maybe we can first find the largest possible validation loss and save this to the ckpt.pt file and best val loss files?

The default 10^9 will be overindexed by the optimization algorithm, maybe we can have the vizier try to skip results that nan before sending a valid value?