Open mathematicalmichael opened 3 days ago
Hi @mathematicalmichael,
The new server version has added the built-in authentication for the fileserver, I assume that for some reason (perhaps due to the fileserver url you're using?) the WebApp does not identify the fileserver and thus is not attaching the cookie when trying to download (the SDK obviously does that).
You can try disabling this feature in the fileserver (using the fileserver.conf file) by setting auth.enabled: false
(you can also do that in the docker-compose or in the docker compose override file with an environment variable) and see if it helps
thanks. yes, I know the new version has auth, which is exactly what I want / need (in fact). So I do not want to disable it (though it's no better than downgrading, I know).
Could it be because the fileserver is not at the subdomain "files.
. "?
(unfortunately I don't have control over subdomain names)
I do however, have some cycles to try and fix it, if it's possible. I just need guidance on the cause of the issue / feasibility of solutions.
In general, the server is configured to place the cookie with a specific domain - I assume the cookie is simply not propagated to the fileserver since it's hosted under a different domain name - in general, if the two services are hosted under some parent domain name (like app.my-domain.com
and files.my-domain.com
) its simply possible to set the cookie domain to the common domain name (e.g. .my-domain.com
)
Can you share the pattern of the domains you're using?
@jkhenning thank you! so it sounds like my suspicion might have been directionally correct and that the cookie's scope is missing our URLs.
The networking set up I am constrained to with this particular ClearML deployment has the following structure:
https://<port>-<hash tied to EC2 instance>.<domain>.<tld>
so my setup is https://8080-....site.com
https://8081-....site.com
https://8008-....site.com
setting it to .site.com
would be a security concern: way too broad a scope. each EC2 instance gets its own URL.
I wrote this part of the ClearML docs:
so I very much remember dealing with this on an earlier deployment (but one where I had control over subdomain names)
I was surprised when the deployment "just worked" with this new domain mapping (for this deployment), but I realize now that was because the fileserver was totally insecure until 1.16.0, so the domain didn't matter. We've been using these urls for six months now, so I'm not sure the aforementioned docs are "exactly correct" anymore.
that all said... take a look at my logs again. Notice that the Debug Images load just fine from the web app, and they're served behind the same backend fileserver URL.
So... what does that tell us about that cookie's scope... When one tab in the ClearML Web UI is able to load assets from the fileserver, but the neighboring tab does not???
Ah, this might be a WebApp issue, some plots (which are too complicated to be stored as a plotly object) are stored as an image, but the link is embedded in the plot object, which means the WebApp has to parse it and decide whether to attach the cookie there, I think the WebApp only knows how to automatically do that for the standard port variants and the standard subdomains.
You should be able to explicitly specify the fileserver URL to the webapp by adding the following env var to the webapp service:
WEBSERVER__fileBaseUrl=https://8081-....site.com
ooh Ill try that env var! thank you!
but I'm not sure that explains why Debug Images work while Plotly image embeds do not. Is it because the two structure the urls differently?
(and I explicitly save some as images for better control over formatting - e.g. histograms. I send some to Debug and some to the Plots tab. Debug tab works, Plot does not. same underlying fileserver url structure, but console logs show 401 only on the latter)
is the scope of the cookie a problem given how the urls are structured? other customers (not us) using the same reverse proxy would have urls with the same domain name, and I dont want those to be valid against my instance...
environment:
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
CLEARML_API_HOST: ${CLEARML_API_HOST:-}
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
WEBSERVER__fileBaseUrl: ${CLEARML_FILES_HOST:-}
yields
Error parsing WEBSERVER__fileBaseUrl JSON value `https://8081-.....com/`: Expecting value: line 1 column 1 (char 0)
if I prepend CLEARML_
to the front of it... it does not complain.
Does work with that env var on 1.15.1
and upgrading to 1.16.0 with that prepended env var does not bring back the images (had to force refresh to avoid browser cache tricking me)
I guess you should put it in quotes?
one thing I noticed poking around the console: the requests that are getting the 401 from Plot tab do not have a cookie set in the request header. the requests that succeed from the Debug Samples tab do have a cookie set in the request header
I guess you should put it in quotes?
tried that, both single and double quotes still throw the same message.
I'm pretty sure the problem is that the cookie isn't set by the template that renders out the plotly images.
It's possible docker compose removes the quotes, can you perhaps try:
WEBSERVER__fileBaseUrl: \"${CLEARML_FILES_HOST:-}\"
@jkhenning unfortunately that also throws the same Error parsing
error.
to my comment about the browser Inspect tool showing a missing cookie (but valid artifact url) in the requests that are 401'ing... could this possibly explain the situation? (cookie not set in the first place)
I'm using the docker-compose stack. Basically everything to recreate my set up is here: https://github.com/ml-starter-packs/clearml-lightning except I bumped my version to 1.16.0
after upgrading my image tags to the latest release, I noticed that the
clearml-fileserver
emitsError getting token
whenever I try to load images in the Plots tab.data still works fine, and oddly enough so do![image](https://github.com/allegroai/clearml-server/assets/40366263/4409c084-e62e-4470-9e50-3f161c8ee071)
Debug Samples
despite them also coming from the fileserver. Those load fine...(top: manual download from web UI in artifacts tab... works fine, seems to authenticate happily) (middle: errors when I load the Plots tab, first screenshot) (bottom: opening the Debug Samples tab)
downgrading to 1.15.1 (as is in the repo linked above) restores all images in Web UI.