Closed WillEngler closed 1 year ago
This is a complex task and we don't know exactly how we're going to approach it yet. So I think we should approach it as a research/prototyping spike. We will call this ticket done when:
I like the "no dependencies" ideal - completely self-contained functions are awesome. Maybe one route to achieving it is to exploit that FuncX's primary serialization mechanism is to serialize code and write out a fake function for it to serialize. This will kind of be like writing out the "skinny Garden" library inside each container:
An idealized Garden function could be like:
def trojan_horse_for_good(*args, **kwargs):
import mlflow
model_name = {{filled in during publication}} # or, read from an env variable set during container build
model = mlflow.get_my_model()
return model.do_my_magic(*args, **kwargs)
Publication could be to write a temporary file to disk containing this function, importing that function, and then registering that function with FuncX (knowing that FuncX will grab the source code you wrote).
Cons:
This week I've been trying something similar to what Logan suggested and hit a lot of roadblocks. Basically I want to be able to take a user's pipeline, compose the steps in it, inject the env variables we need for MLFlow auth, and send that over to Globus Compute. I have not been able to get that working.
Some things I learned ...
mlflow-skinny, pandas<3, numpy<2
CombinedCode
serialization method as opposed to DillCode
, which I was trying to force. The GC team is about to release the ability to pick your serialization method this coming Tuesday. That could help. In the meantime I was mired in serialization/deserialization errors and hit the end of my time box.So ... I'm declaring defeat on the No Dependencies option and pivoting to Skinny Garden. Next up is to scope out the skinny garden approach.
Closing in favor of #158
re-opening for the rush of closing it again when #190 is merged
Problem
I was trying to make a pipeline that uses MAST-ML. MAST-ML pins pyyaml == 5.4.1 in its requirements.
garden-ai
requires >= 6. Because we currently require garden-ai to be in the pip requirements sent to the container service, you can't include MAST-ML in a pipeline. The pyyaml versions conflict and registration fails at container build time.There are tweaks we could do in garden or MAST-ML to get around this particular example. But this problem is more general. It will be unsustainable to keep garden-ai's requirements limited to the lowest common denominator of users' pipeline dependencies.
Imagine we want to include some common benchmarking or evaluation code at the garden level that does tabular data operations with Pandas. If we want to use pandas 2.x we will be preventing the registration of a ton of pipelines that use older pandas versions.
Potential Approaches
In order from most kludgey to least ...
home_run
approach that DLHub took. Maybe we still need something bundled in the container, but we can segment it into a separate package with a smaller number of dependencies so that conflicts are greatly reduced.Acceptance Criteria
Given a user specifies a set of pip or conda dependencies that can be built by the container service, when they submit a pipeline with those requirements, then the container builds.