aimhubio / aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
https://aimstack.io
Apache License 2.0
5.21k stars 320 forks source link

Too long to upload simple metrics #2875

Open NotAnyMike opened 1 year ago

NotAnyMike commented 1 year ago

🐛 Bug

Uploading only single value metrics from a simple naive example takes 2 minutes 32 seconds. This is the simple example and runs almost instantly but uploading after printing print("loading to aim...") takes 2mins ++.

from aim import Run

aim_run = Run(
    repo='aim://x.x.x.x:53800',
    experiment="Example_model",
    capture_terminal_logs=True)  

aim_run.name = "testing new loss"
aim_run.description = "Example description of new run"
aim_run.add_tag("tag1")

# Log run parameters
aim_run['params'] = {
    'learning_rate': 0.001,
    'batch_size': 32,
}

for epoch in range(10):
    for i in range (100):
        aim_run.track(i, name='loss', epoch=epoch, 
                      context={'subset':'train'})
        print('Epoch: {}, Iteration: {}'.format(epoch, i))

        if i % 100 == 50:
            aim_run.track(i, name='loss', epoch=epoch, context={'subset': 'val'})
            aim_run.track(i, name='accuracy', epoch=epoch, context={'subset': 'val'})
print("loading to aim...")

I have a local server run with aim server --repo /data --host 0.0.0.0 --workers 6 and a ui with aim up --repo /data --host 0.0.0.0 --workers 6

To reproduce

  1. Launch your server and UI
  2. Run the example above

Expected behavior

I will <10 seconds to upload everything.

Environment

mihran113 commented 1 year ago

Hey @NotAnyMike! Thanks for reporting! Can I ask you to try it out with a local repo, to see if that's being completed any faster? As on a local network with no latency this shouldn't take that long. For ex. on my intel mac it runs around 10s. Generally tracking on a server is around 10x slower than locally. On the upcoming major version, we've done a lot of changes to address this.

NotAnyMike commented 1 year ago

@mihran113 locally runs on 10 seconds too. Basically it is an expected behaviour then. Is there big improvements regarding this in the 4.0 version? Could you tell me how long does it take this script in version 4.0?

diogo-sr commented 3 months ago

Are there any updates on this topic? I have the exact same issue