Getting Started with ML-Dash

hugoleodev commented 3 years ago

I have considered using ml-dash with ml-logger, but I ml-dash tutorial is not working for me. Is there a fresh example of how to use ml-logger with ml-dash?

My need is to run a ml_dash.server instance within a remote server and all applications will log to main remote ml_dash.server

geyang commented 3 years ago

@hugoleodev Let me whip up one real-quick today! Will get back to you tomorrow.

hugoleodev commented 3 years ago

@geyang nice!

hugoleodev commented 3 years ago

@geyang Any news about it?

geyang commented 3 years ago

Yes! Should be ready soon! Taking a final look before pushing.

-- which page/URL are you looking at? This way I can better prioritize which one to update first. (github README/readthedoc etc)

geyang commented 3 years ago

I have considered using ml-dash with ml-logger, but I ml-dash tutorial is not working for me. Is there a fresh example of how to use ml-logger with ml-dash?

My need is to run a ml_dash.server instance within a remote server and all applications will log to main remote ml_dash.server

So the ml_dash.server is separate from the ml_logger.server. The later is used to log, whereas the former is a visualization backend used for processing the data to server to the dashboard front end.

So to setup remote logging, inside the instrumentation server, you need to launch both servers in separate processes:

#!/usr/bin/bash
#launch-logger:
python -m ml_logger.server --data-dir ~/runs --port 8080 --host 0.0.0.0 --workers 4

Then in a separate process,

#launch-dash-server:
python -m ml_dash.server --logdir runs --port 8090 --host 0.0.0.0

Configuring the Dashboard Client

Inside your client, you should put in the URL for the dashboard server as follows:

Logging using ML-Logger

A key issue to keep in mind, is that inside your ~/runs folder, the experiments need to be under two layers of directories:

$ tree -L 3 ~/runs
<your-username>
    <your-project-name>
       <experiment-1>
       <experiment-2>

This means that with ML-Logger, you need to configure the prefix so that the logging is prefixed by {your-username}/{your-project-name}.

For example, to log a metric:

import numpy as np
from ml_logger import logger

experiment_id = "experiment-1"

logger.configure(root_dir="http://localhost:8080", prefix=f"hugo/getting-started/{experiment_id}")
logger.log_line("""
                charts:
                - yKey: loss
                   xKey: epoch
                - yKey: loss
                   xKey: __timestamp 
                """, file=".charts", dedent=True)

for i in range(100):
      loss = np.exp(-i) + np.random.uniform(0, 0.05)
      logger.log(epoch=i, loss=loss, flush=True)

hugoleodev commented 3 years ago

@geyang very nice! Is there any way to store metrics in a database?

I would like to create a centralized way to maintain all experiments data.

geyang commented 3 years ago

Hey Hugo, I currently save all data as pkl files individually, because the metrics reported by each project could be drastically different. So to collect all data, I just glob for the metrics.pkl file and read the content into a Pandas.DataFrame.

ML-Logger offers an easy way to do this:

from ml_logger import logger

all_metrics = logger.read_metrics('your-prefix/project/experiment-*')

This has the additional benefit of making it easy to delete data by removing entire folders.

hugoleodev commented 3 years ago

HI @geyang

I have tested your example and, As far as can see, the current architecture does not fit our needs.

We will deploy ml-logger / ml-dash within a tsuru pass, just like the Heroku pass.

To work well in tsuru, we need two apps:

One with ml_logger.server
One with ml_dash.server and ml_dash.app

But as you mentioned ml_dash.server and ml_logger.server needs to watch the same directory in the filesystem.

But as I have listed above, the two apps do not have the same filesystem.

I think that if ml_dash.server can connect to ml_logger.server through HTTP, the filesystem access will be only in ml_logger.server.

What do you think about it?

hugoleodev commented 3 years ago

@geyang

FROM

TO

geyang commented 3 years ago

@hugoleodev Thanks for your comment!

When logging experiments the amount of storage can grow very quickly if you are saving checkpoints. For this reason, when I deploy on AWS, I attach an EBS as persistent storage that life-cycles independently from the EC2 instances. When you run tsuru process, what kind of strategy do you adopt for persistent storage?

If I were using docker, I would mount the same volume to the two docker processes. I expect tsuru process to expose similar APIs to physical volumes or EBS, that you can configure to point to the same physical drive. I can also see how the second architecture is more convenient :) The way I would go about it would be to create an AsyncFile API that abstracts away file access :)

geyang commented 3 years ago

I did some search, do you think configurations in this thread would help?

https://github.com/tsuru/tsuru/issues/1294

hugoleodev commented 3 years ago

Hi @geyang, me again.

Right,

I have set up our tsuru app to work with a shared filer mounted on the $HOME/runs folder.

And now it works!!!!

But the example does not show any chart with the example that you show above

Can you help me with it?

hugoleodev commented 3 years ago

@geyang Here is the error in ml-dash front

geyang commented 3 years ago

@hugoleodev Great job getting the example to run! The error message might be benign (it does not really do much). To configure the line plot, you need to put the following into the .charts file:

charts:
- yKey: loss
  xKey: epoch

geyang / ml_logger

Getting Started with ML-Dash #36

Configuring the Dashboard Client

Logging using ML-Logger