Open jluethi opened 8 months ago
My take:
Server-side, we must choose one option (any of the two will do), since we rely on zarr_url to uniquely identify elements in the image list. The way it can work is that any zarr_url
is normalized before using it in fractal-server (e.g. before adding it into an image list, before using it to check if it already exists, ...). I would start with os.path.normpath
(which produces paths without trailing slash), although we'll need to update this to support s3 URLs in the future.
Task-side, we should rather be robust. For filesystem paths, this can be achieved e.g. by using the os.path.normpath
function or by using pathlib.Path
objects. For s3 URLs, we will later find the best way to normalize them.
Side comment (I mentioned it in the past, but without a link): something close to pathlib but for s3 URLs is https://github.com/aws-samples/s3pathlib-project.
The strategy sounds good to me. Is there some library we can already use now that would work for the normalization both for paths and s3 objects? e.g. something in FSSpec or s3fs?
Is there some library we can already use now that would work for the normalization both for paths and s3 objects? e.g. something in FSSpec or s3fs?
I've not found one during a very quick look. I'll need to review it later.
Server will only provide zarr_url without trailing slashes. Still some care when handling this in different tasks
One example where a task should be made more robust: InitArgsMIP.origin_url
will or won't be normalized depending on how it is prepared within the copy task, and then it's used in a non-safe way:
data_czyx = da.from_zarr(init_args.origin_url + "/0")
within the MIP task.
We should handle this more carefully in our tasks. Let's use this issue to tag tasks that need care taken for this.
@tcompa Should zarr_urls have trailing slashes? Are we robust against it? What should the correct behavior be?