`NeptuneCallback` produces lots of `X-coordinates (step) must be strictly increasing` errors

Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Apache License 2.0

28.12k stars 3.37k forks source link

Bug description

When Optuna is run in parallel mode (n_jobs=-1), with NeptuneCallback, I get: [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: trials/values. Invalid point: 0.0 It's normal that during parallel or distributed hyperparam optimization, information become unordered. Either Neptune should support adding steps out of order, or NeptuneCallback should support it somehow (e.g. by using an artificial step number).

What version are you seeing the problem on?

v1.x

How to reproduce the bug

study.optimize(..., callbacks=[NeptuneCallback(run)], n_jobs=-1)

Error messages and logs

[neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: trials/values. Invalid point: 0.0

Environment

Any multi-threaded environment.

More info

No response

Lightning-AI / pytorch-lightning