Publish an AgentRun to a WebRegistry

nickjalbert commented 2 years ago

This PR allows us to publish an AgentRun to the WebRegistry. How I ended up here:

I was able to publish the Output of the SB3 agent to the WebRegistry. However, we store all the goodies (like environment and agent IDs) on the child AgentRun
I updated agentos run to print more info about the Output and its children
I tried to publish a child AgentRun, but kept running into issues with inconsistencies in the generation of AOS/PCS objects from existing MLflow runs (i.e. sometimes we allowed passing of existing_run_id in the object __init__(), sometimes not)
Now each run type (MLflowRun, Output, AgentRun) essentially implements two methods: one to initialize a new run (__init__()) and another to initialize the object from an existing MLflow run in mlruns/ (from_existing_mlflow_run()). They share some creation and validation logic.

In this PR, the following script works and pushes the AgentRun as well as the Output and its children recursively:

# Run a web server in a tab, empty the DB
# In a new tab
cd example_agents/sb3_agent
rm -rf mlruns/
agentos run sb3_agent
USE_LOCAL_SERVER=True agentos publish <AgentRun ID from previous command>

Questions and TODOs:

Maybe we should instead just have an __init__() that handles both cases (new MLflow run, MLflow run from disk) and have a switch based on the scenario we're in
The AgentRun doesn't have a spec:link to its parent Output (you can trace it via the tags, but we probably want this stored in the DAG itself)

nickjalbert commented 2 years ago

@andyk tests are green now, but maybe hold off because, upon re-inspection, I think I want to take another swing at this and move the registration and validation methods back into the classes. Originally, I thought it wouldn't work because of complications with inheritance and overriding methods on subclasses. Looking again, however, I think with enough calls to super() it should work.

Overall, this won't change the larger strategy (having two paths, one to create a new in-memory MLflowRun-subclass object (__init__()) and one to create an MLflowRun-subclass object based on an on-disk run in mlruns/ (from_existing_mlflow_run())). If you prefer a different way to tackle this problem, let's discuss!

Will ping when this is ready for a review.

andyk commented 2 years ago

Would it simplify things if we had the mlflow_run be a member of an AgentRun rather than use inheritance?

andyk commented 2 years ago

I have been starting to think that maybe we should think of MLflow less as an integral part of PCS and more as a pluggable storage backend + visualization tool which PCS can be configured to write/read its registries to/from.

If we thought if it more like that, then maybe we could basically support Component.to_mlflow() and .from_mlflow() methods. Or another option would be for the MLflowRun component to be a shallow pointer such as:

my_mlflow_run_spec:
    type: MLflowRun
    mlflow_tracker_url: http://localhost:5000
    mlflow_run_id: 28c7fe4c989d7f3f7fe489d7fe4374a8

nickjalbert commented 2 years ago

Yeah, the pluggable MLflow backend sounds like a promising way to go! Moved the global functions into the classes and renamed agent_output.py -> agent_run.py. Will merge once tests are green.

Thanks for the review!

andyk commented 2 years ago

🤗

agentos-project / agentos

Publish an AgentRun to a WebRegistry #397