fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Review whether `zarr_url`s have trailing slash or not #677

Open jluethi opened 3 months ago

jluethi commented 3 months ago

@tcompa Should zarr_urls have trailing slashes? Are we robust against it? What should the correct behavior be?

tcompa commented 3 months ago

My take:

  1. Server-side, we must choose one option (any of the two will do), since we rely on zarr_url to uniquely identify elements in the image list. The way it can work is that any zarr_url is normalized before using it in fractal-server (e.g. before adding it into an image list, before using it to check if it already exists, ...). I would start with os.path.normpath (which produces paths without trailing slash), although we'll need to update this to support s3 URLs in the future.

  2. Task-side, we should rather be robust. For filesystem paths, this can be achieved e.g. by using the os.path.normpath function or by using pathlib.Path objects. For s3 URLs, we will later find the best way to normalize them.

tcompa commented 3 months ago

Side comment (I mentioned it in the past, but without a link): something close to pathlib but for s3 URLs is https://github.com/aws-samples/s3pathlib-project.

jluethi commented 3 months ago

The strategy sounds good to me. Is there some library we can already use now that would work for the normalization both for paths and s3 objects? e.g. something in FSSpec or s3fs?

tcompa commented 3 months ago

Is there some library we can already use now that would work for the normalization both for paths and s3 objects? e.g. something in FSSpec or s3fs?

I've not found one during a very quick look. I'll need to review it later.

tcompa commented 2 months ago

Ref https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/652

jluethi commented 2 months ago

Server will only provide zarr_url without trailing slashes. Still some care when handling this in different tasks

tcompa commented 2 months ago

One example where a task should be made more robust: InitArgsMIP.origin_url will or won't be normalized depending on how it is prepared within the copy task, and then it's used in a non-safe way:

    data_czyx = da.from_zarr(init_args.origin_url + "/0")

within the MIP task.

jluethi commented 2 months ago

We should handle this more carefully in our tasks. Let's use this issue to tag tasks that need care taken for this.