allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.63k stars 653 forks source link

Fileserver authentication #836

Open jerbob92 opened 1 year ago

jerbob92 commented 1 year ago

Proposal Summary

Add an authentication layer on the default fileserver (like/through the apiserver).

Motivation

While mentioned here: https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_security/#file-server-security I think this should be in a big red box somewhere else, it's seems quite insecure to leave this open like this while it can integrate perfectly fine with the authentication system of apiserver.

awoimbee commented 1 year ago

This issue should be moved to https://github.com/allegroai/clearml-server as the clearml client has logic for sending auth headers (for app.clear.ml). The open source fileserver is very bare bones...

jerbob92 commented 1 year ago

This issue should be moved to https://github.com/allegroai/clearml-server as the clearml client has logic for sending auth headers (for app.clear.ml). The open source fileserver is very bare bones...

Sounds good to me. It is bare bones but ClearML has told me they have a version that has authentication in the paid version, so I don't know why they would leave that out here.

ainoam commented 1 year ago

Hi @jerbob92 Apologies on taking a while to get back. As @awoimbee mentioned, yes this is an issue for the server repo :slightly_smiling_face: To the issue at hand: ClearML open-source is indeed bare bone, I would not expose it directly to a public network even though it has verification. Regarding the file server component, the paid version is fully authenticated (same mechanism of JWT token for the request as API authentication), the main caveat in pushing it into the open source is crossing the JWT verification mechanism between the independent services and sharing the JWT secret . Unfortunately this really complicates things. Regardless I would recommend configuring the default file server to point to an S3 bucket (or GCP / Azure) for maximum control over credentials and access. WDYT?

jerbob92 commented 1 year ago

No problem!

I don't see how this would complicate things? This is exactly how JWT is supposed to work, right? It should be possible to share the JWT secret between the different deployments and have it work.

And since you do have it in the paid version, it sounds like you have already figured out how it should work.

If you would accept a PR on this I think I would be able to implement this fairly easily as long as the ClearML CLI/SDK do send the JWT token to the fileserver.

ainoam commented 1 year ago

And since you do have it in the paid version, it sounds like you have already figured out how it should work.

Yes, and unfortunately to actually properly support it, is more complicated than just adding a few lines :(