GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.51k stars 860 forks source link

Unexpected change of bucket path due to lstrip in PyTorch container example #458

Closed lxuechen closed 2 years ago

lxuechen commented 4 years ago

I ran into this issue when I went over the tutorial given here.

Given how the model checkpoint copying is written here, one would expect that all the checkpoints would be written to the path job_dir/model_name. This, however, is not the case due to the usage of str.lstrip on this line. Note that lstrip removes from left to right all the characters that appear in the argument, which itself is treated like a set, as opposed to a fixed token. So for instance, if bucket_id contains characters which appear in the remaining string of bucket_path, then those characters would get wiped out. Consider a simple example:

>>> 'abcaaaad'.lstrip('abc')
'd'

Perhaps a simple fix is to change this line to use something instead of lstrip, e.g.

bucket_path = job_dir[len('{}/'.format(bucket_id)):]
andrewferlitsch commented 4 years ago

@dizcology PTAL