Closed hugoleodev closed 1 year ago
@hugoleodev Let me whip up one real-quick today! Will get back to you tomorrow.
@geyang nice!
@geyang Any news about it?
Yes! Should be ready soon! Taking a final look before pushing.
-- which page/URL are you looking at? This way I can better prioritize which one to update first. (github README/readthedoc etc)
I have considered using ml-dash with ml-logger, but I ml-dash tutorial is not working for me. Is there a fresh example of how to use ml-logger with ml-dash?
My need is to run a ml_dash.server instance within a remote server and all applications will log to main remote ml_dash.server
So the ml_dash.server
is separate from the ml_logger.server
. The later is used to log, whereas the former is a visualization backend used for processing the data to server to the dashboard front end.
So to setup remote logging, inside the instrumentation server, you need to launch both servers in separate processes:
#!/usr/bin/bash
#launch-logger:
python -m ml_logger.server --data-dir ~/runs --port 8080 --host 0.0.0.0 --workers 4
Then in a separate process,
#launch-dash-server:
python -m ml_dash.server --logdir runs --port 8090 --host 0.0.0.0
Inside your client, you should put in the URL for the dashboard server as follows:
A key issue to keep in mind, is that inside your ~/runs
folder, the experiments need to be under two layers of directories:
$ tree -L 3 ~/runs
<your-username>
<your-project-name>
<experiment-1>
<experiment-2>
This means that with ML-Logger, you need to configure the prefix so that the logging is prefixed by {your-username}/{your-project-name}
.
For example, to log a metric:
import numpy as np
from ml_logger import logger
experiment_id = "experiment-1"
logger.configure(root_dir="http://localhost:8080", prefix=f"hugo/getting-started/{experiment_id}")
logger.log_line("""
charts:
- yKey: loss
xKey: epoch
- yKey: loss
xKey: __timestamp
""", file=".charts", dedent=True)
for i in range(100):
loss = np.exp(-i) + np.random.uniform(0, 0.05)
logger.log(epoch=i, loss=loss, flush=True)
@geyang very nice! Is there any way to store metrics in a database?
I would like to create a centralized way to maintain all experiments data.
Hey Hugo, I currently save all data as pkl files individually, because the metrics reported by each project could be drastically different. So to collect all data, I just glob for the metrics.pkl
file and read the content into a Pandas.DataFrame
.
ML-Logger offers an easy way to do this:
from ml_logger import logger
all_metrics = logger.read_metrics('your-prefix/project/experiment-*')
This has the additional benefit of making it easy to delete data by removing entire folders.
HI @geyang
I have tested your example and, As far as can see, the current architecture does not fit our needs.
We will deploy ml-logger / ml-dash within a tsuru pass, just like the Heroku pass.
To work well in tsuru, we need two apps:
But as you mentioned ml_dash.server and ml_logger.server needs to watch the same directory in the filesystem.
But as I have listed above, the two apps do not have the same filesystem.
I think that if ml_dash.server can connect to ml_logger.server through HTTP, the filesystem access will be only in ml_logger.server.
What do you think about it?
@geyang
FROM
TO
@hugoleodev Thanks for your comment!
When logging experiments the amount of storage can grow very quickly if you are saving checkpoints. For this reason, when I deploy on AWS, I attach an EBS as persistent storage that life-cycles independently from the EC2 instances. When you run tsuru process, what kind of strategy do you adopt for persistent storage?
If I were using docker, I would mount the same volume to the two docker processes. I expect tsuru process to expose similar APIs to physical volumes or EBS, that you can configure to point to the same physical drive. I can also see how the second architecture is more convenient :) The way I would go about it would be to create an AsyncFile API that abstracts away file access :)
I did some search, do you think configurations in this thread would help?
Hi @geyang, me again.
Right,
I have set up our tsuru app to work with a shared filer mounted on the $HOME/runs folder.
And now it works!!!!
But the example does not show any chart with the example that you show above
Can you help me with it?
@geyang Here is the error in ml-dash front
@hugoleodev Great job getting the example to run! The error message might be benign (it does not really do much). To configure the line plot, you need to put the following into the .charts
file:
charts:
- yKey: loss
xKey: epoch
I have considered using ml-dash with ml-logger, but I ml-dash tutorial is not working for me. Is there a fresh example of how to use ml-logger with ml-dash?
My need is to run a ml_dash.server instance within a remote server and all applications will log to main remote ml_dash.server