JuliaAI / MLFlowClient.jl

Julia client for MLFlow.
https://juliaai.github.io/MLFlowClient.jl/
MIT License
46 stars 9 forks source link

Artificact upload for remote instances #45

Open freemin7 opened 5 months ago

freemin7 commented 5 months ago

As per https://github.com/JuliaAI/MLFlowClient.jl/blob/a31b41a91dc32107fa7dcd6ad5a581c116a4ff9c/src/loggers.jl#L69-72 we currently can't upload artifacts to remote servers

    Assumes that artifact_uri is mapped to a local directory.
    At the moment, this only works if both MLFlow and the client are running on the same host or they map a directory that leads to the same location over NFS, for example.

Some (untested) code along the lines of:

function logartifact(mlf::GitLabMLFlow, run_id::AbstractString, basefilename::AbstractString, data)
    mlflowrun = getrun(mlf, run_id)
    artifact_uri = mlflowrun.info.artifact_uri
    filepath = joinpath(artifact_uri, basefilename)
    try
        HTTP.post(url, push!(copy(mlf.headers), "Content-Type" => "multipart/form-data" ),
            HTTP.Form(Dict(:file => data)))
    catch e
        error("Unable to upload $(filepath): $e")
    end
    filepath
end

Would probably work for this purpose and not introduce new dependencies. The question remains how to differentiate between local and remote file access, as they happen through different interfaces (the headers are irrelevant for file access for example).

I see three options to approach this and would be interested in making a pull request implementing one of these and testing them against a local mlflow instance and the gitlab end points to confirm they are working (for me).

  1. Use a different subtype of AbstractMLFlow to denote remote instances. (Probably not preferred)
  2. Keep using a String and just have an if checking for prefixes of "https://" and use HTTP then.
  3. Switch to using URIs and use file:// to denote file access and check the prefix in an if statement.

This is simply a question of taste i would leave to (some of) the maintainers.

pebeto commented 5 months ago

Hello,

I think the issue comes from the different forms of uploading to a mlflow instance. There is no dedicated REST endpoint to upload a run artifact (and that's awful).

We can take this page as a reference.

freemin7 commented 5 months ago

If it is credible to assume to upload procedures might even depend on the service in question and not just on the protocol we would have to go the AbstractMLFlow route.

Is there buy-in for that?

Common functionality depending on URI can be part of the library. Although to prevent pulling in extra dependencies for every protocol under the sun having a separate package for uploads might be a better idea if such a package taking URIs doesn't exist yet. (I am not aware of that existing)