DistributedScience / Distributed-CellProfiler

Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
https://distributedscience.github.io/Distributed-CellProfiler/
Other
38 stars 24 forks source link

Is /home/ubuntu/bucket mount point hardcoded? #152

Closed sooheon closed 1 year ago

sooheon commented 1 year ago

In logs, that seems to be the command given to the cellprofiler instance within the docker container, even though I've never configured bucket as the mount point.

Should I ensure to mount my bucket always to /home/ubuntu/bucket/ and always prefix PathNames with /home/ubuntu/bucket/ in load_data.csv?

ErinWeisbart commented 1 year ago

Yes, /home/ubuntu/bucket/ is hardcoded and that is how you should prefix path names within load_data.csv.

The Dockerfile is the first script to execute in the Docker. It creates the /home/ubuntu/ folder and then executes run_worker.sh from that point. run_worker.sh makes /home/ubuntu/bucket/ and uses S3FS to mount your S3 bucket at that location. (If you set DOWNLOAD_FILES='True' in your config, then the S3FS mount is bypassed but files are downloaded locally to the /home/ubuntu/bucket path so that the paths are the same as if it was S3FS mounted.)

sooheon commented 1 year ago

Thanks for the links to the code!

Think it would help to clarify this in the docs.

sooheon commented 1 year ago

Also, I saw some load_data.csv in the cellpainting-gallery that have s3:// prefix, how does this work?

ErinWeisbart commented 1 year ago

Right now, Distributed-CellProfiler does not handle paths with s3:// prefix in load_data.csv. CellProfiler will directly load files from S3 with s3:// prefixes if you build it from the master branch but that function is not available in any current CellProfiler versions. (You can expect this at least in CP v5, if not before.)

I believe that the s3:// paths currently in some load_data.csv are because the actual load_data.csv were temporarily overwritten with a version optimized for a particular deep learning workflow. They will be restored eventually, but unfortunately I don't have a specific time estimate on that restoration.

ErinWeisbart commented 1 year ago

Added to docs in https://github.com/DistributedScience/Distributed-CellProfiler/commit/ebe3b7f9c4519c9bcd699e69b44731fb65d2d1da