aimhubio / aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

https://aimstack.io

Apache License 2.0

4.93k stars 297 forks source link

Pytorch Lightning run is marked as finished after .fit loop #3132

Closed Michael-Tanzer closed 2 weeks ago

Michael-Tanzer commented 2 months ago

🐛 Bug

When using the pytorch lightning aim logger, a run will be marked as finished after the fit loop, ignoring the test loop and any metric logged there.

Expected behavior

The logger should mark the run as finished only on exit, after testing loop and any other additional logging.

Environment

Aim Version: 3.19.2
Python version: 3.10.8
Lightning version: 2.0.1
pip version: 22.3.1
OS: Linux

mihran113 commented 2 months ago

Hey @Michael-Tanzer! Thanks for the report. The run is being closed, because pytorch lightning is calling .finalize() method on the logger. But when the test loop starts, and the trainer logs any additional metrics during the test loop aim.Run will be reopened. I've tested it on our example(https://github.com/aimhubio/aim/blob/main/examples/pytorch_lightning_track.py) and the test loss is successfully tracked. Can you please double-check if the test metrics are tracked?

Michael-Tanzer commented 2 months ago

Hi, I'm glad it's working on this example, but there is also another ticket with pretty much the same issue. Could it be related to the fact that I am using a remote server? My current fix is to disable finalize and later finalize the run manually.

Michael-Tanzer commented 2 months ago

3097

mihran113 commented 2 months ago

Oh, yeah, remote tracking is actually causing this. I've just opened a PR which should address that: https://github.com/aimhubio/aim/pull/3134 We'll release a patch version today or tomorrow which will include the fix for this issue.

Michael-Tanzer commented 2 months ago

Thank you! This is awesome news! I will close this issue then