dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
7 stars 2 forks source link

Find way to shrink container #192

Closed dchaley closed 3 months ago

dchaley commented 3 months ago

The current container is 8.5 GB which takes a looooong time to download.

It most likely does not need to be this big, it's (apparently) a lot of GPU drivers. We only need the GPU drivers that we'll use.

Can we remove the other drivers? ---> there are probably lots of best practices to look up

dchaley commented 3 months ago

Base image for TF 2.8: but does it have python? (and if we install python + TF, does it reinstall TF?) https://hub.docker.com/layers/tensorflow/tensorflow/2.8.4-gpu/images/sha256-4351b59baf4887bcf47eb78b34267786f40460a81fef03c9b9f58e7d58f1c7b7?context=explore

lynnlangit commented 3 months ago

Ideas:

dchaley commented 3 months ago

Attempt: image streaming: FAIL 😿

Yesterday's results [logs]:

From batch starts initializing until we start our container:

2024-04-30 14:56:20.150 PDT Cloud Batch server: "batch.googleapis.com:443"
2024-04-30 15:02:23.233 PDT Executing runnable container

Almost exactly 6 minutes to load the container (& the GPU driver install).

Attempt 1: failed due to insufficient disk (??) [logs]

failed to pull and unpack image "us-central1-docker.pkg.dev/deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking:gce": failed to extract layer sha256:5bfd133cf20faa9b20611c91cc59c104a1295d6ae3ef0cfd6aa3ea573df21780: write /var/lib/containerd/io.containerd.snapshotter.v1.gcfs/snapshotter/snapshots/36/fs/opt/conda/lib/python3.7/site-packages/scipy/optimize/_highs/_highs_wrapper.cpython-37m-x86_64-linux-gnu.so: no space left on device: unknown

Note failed to extract; no space left (on unknown device).

Attempt 2: failed due to ... not sure? Couldn't find the file. [batch job]

2024-05-01 13:29:42.012 PDT
python: can't open file 'benchmarking/deepcell-e2e/benchmark.py': [Errno 2] No such file or directory
lynnlangit commented 3 months ago
Screenshot 2024-05-01 at 8 29 26 PM

Is it the first one listed?

lynnlangit commented 3 months ago

might want to test streaming with a smaller container and w/out need to install gpu drivers to see if streaming works at all

dchaley commented 3 months ago

Good idea– I created #204 to follow up on streaming (while we pursue actually shrinking the container here)

lynnlangit commented 3 months ago

After today's work - would you consider this complete? Size is now < 2.5 GB right?

dchaley commented 3 months ago

Complete enough for now probably, although I think we can remove a few caches. (A possible followup)

On Fri, May 3, 2024, 6:57 PM Lynn Langit @.***> wrote:

After today's work - would you consider this complete? Size is now < 2.5 GB right?

— Reply to this email directly, view it on GitHub https://github.com/dchaley/deepcell-imaging/issues/192#issuecomment-2093955120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACV6BOWYH3KDTBYJCORYFDZAQ6BZAVCNFSM6AAAAABG3NEZNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTHE2TKMJSGA . You are receiving this because you authored the thread.Message ID: @.***>

dchaley commented 3 months ago

Reopening; the code isn't actually committed yet

dchaley commented 3 months ago

Follow-up optimization opportunity: #209