aimhubio / aim

Aim šŸ’« ā€” An easy-to-use & supercharged open-source experiment tracker.
https://aimstack.io
Apache License 2.0
5.23k stars 321 forks source link

Aim UI does not scale well when logging many metrics (multitask) #3215

Open jorenretel opened 2 months ago

jorenretel commented 2 months ago

šŸ› Bug

Hi, first of all, thanks for this great tool. It is a pleasure to use.

I have this specific project though where I am training a network on very many tasks (1000's). So what I do that works really well: log a distribution of metrics (say a distribution of correlation coefficients over the different tasks).

So now what does NOT work well: I want to log individual metrics for these tasks so that I can find out easily which tasks train well and which have more problems. I don't get that information from looking at the distribution plot. I don't necessarily want to look at all the individual metric curves for each task, but storing the numbers in aim is really useful as it is also great to programmatically access them.

Logging many tasks to aim does not seem to be a problem in itself. But when I open the UI, it completely stops working (especially if I "accidentally" click the metrics tab). There are of course some metrics (like aggregated metrics and the loss curve) that I am interested in seeing though. So clicking that tab is not completely "accidental".

I did see the issues about performance issues in the case of very many runs:

But I think this is orthogonal to that in some sense, therefore making this separate issue.

To reproduce

create a new aim repo:

aim init

run this python script simulating logging 5 epochs with 5000 tasks:

import aim
import math

n_tasks = 5000
run = aim.Run()
for epoch in range(5):
    for i in range(n_tasks):
        run.track(math.sin(i), name=f'metric_task_{i}', epoch=epoch)

Spin up the UI and click around a bit:

aim up

The runs window loads really slowly and it has problems displaying the table. The metrics tab basically completely blocks.

Expected behavior

either:

  1. the UI would somehow be able to deal with this number of metrics (by lazy loading or something, which the aim UI actually already seems to do for large parts, but somehow not enough).

or:

  1. let me declare while logging that this some metrics are aggregated (just a normal metric) and some others to be part of a collection where each individual scalar belongs to one task, basically a vector of metrics. This could help decide how to treat them in the UI. Note that, at least in my use case, these tasks have names, not just a position in the vector, so it's more like a dict actually.

I realize that option number 2 is a feature request and not a bug report. In which case, my excuses.

Environment