Closed sooheon closed 1 year ago
Yes, /home/ubuntu/bucket/
is hardcoded and that is how you should prefix path names within load_data.csv.
The Dockerfile is the first script to execute in the Docker. It creates the /home/ubuntu/
folder and then executes run_worker.sh from that point.
run_worker.sh makes /home/ubuntu/bucket/
and uses S3FS to mount your S3 bucket at that location. (If you set DOWNLOAD_FILES='True'
in your config, then the S3FS mount is bypassed but files are downloaded locally to the /home/ubuntu/bucket
path so that the paths are the same as if it was S3FS mounted.)
Thanks for the links to the code!
Think it would help to clarify this in the docs.
Also, I saw some load_data.csv
in the cellpainting-gallery that have s3://
prefix, how does this work?
Right now, Distributed-CellProfiler does not handle paths with s3://
prefix in load_data.csv
.
CellProfiler will directly load files from S3 with s3://
prefixes if you build it from the master branch but that function is not available in any current CellProfiler versions. (You can expect this at least in CP v5, if not before.)
I believe that the s3://
paths currently in some load_data.csv
are because the actual load_data.csv
were temporarily overwritten with a version optimized for a particular deep learning workflow. They will be restored eventually, but unfortunately I don't have a specific time estimate on that restoration.
In logs, that seems to be the command given to the cellprofiler instance within the docker container, even though I've never configured
bucket
as the mount point.Should I ensure to mount my bucket always to
/home/ubuntu/bucket/
and always prefix PathNames with/home/ubuntu/bucket/
inload_data.csv
?