Closed materight closed 11 months ago
Hi @materight,
If the bucket has a specific subdir defined in the configuration, wouldn't it make sense to check these permissions on the subdir? Or are you saying this is not supported in Google Storage?
Hi @jkhenning. The second one, this method works only on buckets: https://cloud.google.com/storage/docs/json_api/v1/buckets/testIamPermissions
@materight got it 👍
In that specific case, how did you set the default output destination? I'm not sure I get how marking out the subdir will prevent that, since the line that basically causes the file to be mentioned in the request is the one setting the blob to the test_obj, and when setting a default output_uri you do not specify a file - do how did the file end up there?
So this is something I couldn't figure out, I just call set_output_destination('gs://my-bucket')
. It only happens when I run a task remotely after cloning it, so it's hard to debug.
But the main problem here is that the subdir is passed as bucket_url
, so the resulting bucket
object is invalid anyway. So blob.exists()
is false because the bucket is invalid, then test_obj == bucket
, and then the request fails because the bucket has also the subdir in the uri.
That makes sense, but it wouldn't yield the same error 😕
Sorry, what do you mean?
If blob.exists()
is false because the bucket is invalid, and test_obj == bucket
, than the resulting object does not reference the file, and the error you mentioned can't be raised, so I'm worried about the flow that generates this error (even if the fix you added is indeed required)
Ah got it. If you can give me some hint on how/where this config.subdir
is set I can take a deeper look. I'm able to reproduce it, just not locally.
It can basically be set in the configuration file's google.storage
credentials section (see here) when configuring the credentials for the bucket. It's strange you have it configured when running remotely but not when running locally - can you share how the agent running this remotely is configured (especially in these storage related sections)?
I have this in the config file:
sdk {
google.storage {
project: "myproject"
credentials_json: ${GOOGLE_APPLICATION_CREDENTIALS}
pool_connections: 512
pool_maxsize: 1024
}
development {
default_output_uri: "gs://clearml-bucket"
}
}
Also in the agent's logs I only see this:
sdk.google.storage.project = myproject
sdk.google.storage.credentials_json = /service_account_key.json
sdk.google.storage.pool_connections = 512
sdk.google.storage.pool_maxsize = 1024
sdk.development.default_output_uri = gs://clearml-bucket
Btw I'm using pipelines, the result_TRAIN_config.pkl
artifact it tries to access is a parameter of the pipeline component.
And can you reproduce that error message you shared?
Yes, if I clone the failed task and re-enqueue it I get the same error. But if I run a new task it doesn't happen all the time.
The config file you attached is from your own workstation? If so, what's the config file used by the agent?
It's exactly the same for both
And the task log in the remote run where it happens? Can you share?
Sure, here: task_ff8c8917072641bca89f27ddd488d74c.log
Hi @jkhenning any update on this? Would it be possible to merge it and dig in more if the issue reappear?
Patch Description
When running a cloned task remotely and using GS as storage, calling
task.logger.set_default_upload_destination
usually fails with an error like:This is because the permissions are being tested on a single file instead of a GS bucket. Removing the
subdir
from the bucket url should fix it.Testing Instructions
task.logger.set_default_upload_destination