dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
7 stars 2 forks source link

Run benchmark on Google Batch #183

Closed dchaley closed 3 months ago

dchaley commented 4 months ago

Run a benchmark on Google Batch. Need to adjust the code to fetch the machine type.

☐ benchmark with CPU ❌ benchmark with GPU (after >24hrs)

dchaley commented 4 months ago

Jobs with GPU have not even started, ~24+ hours later 😨

We've seen two errors:

1:

Quota checking process decided to delay scheduling for the job the-second-batch-t-d13a30be-dc77-49cf0 due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 500, usage: 480, wanted: 30.], next schedule time 2024-04-17 14:32:34.04100913 -0700 PDT m=+411937.323907321.

We don't know where the 480GB usage comes from 🤷 But we did see 500GB quota in the quotas page.

2:

VM in Managed Instance Group meets error: Batch Error: code - CODE_GCE_ZONE_RESOURCE_POOL_EXHAUSTED, description - error count is 5, latest message example: Instance 'the-second-batch-t-d13a30be-dc77-49cf0-group0-0-blhv' creation failed: The zone 'projects/deepcell-401920/zones/us-west1-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

Sad!

lynnlangit commented 4 months ago

tips - use n2-standard-x instances, also try another region us-central1(?)

dchaley commented 3 months ago

Running STANDARD provisioning, n1-standard-8 machine, w/ T4 GPU: STATUS_CHANGED 2024-04-30T21:16:14.809370937Z
Job state is set from SCHEDULED to RUNNING for job projects/148281182590/locations/us-central1/jobs/j20240430-gpu-standard-1. STATUS_CHANGED 2024-04-30T21:08:44.860491667Z
Job state is set from QUEUED to SCHEDULED for job projects/148281182590/locations/us-central1/jobs/j20240430-gpu-standard-1.

lynnlangit commented 3 months ago

did the run complete ok? How long did it take? Using 1 image of what size?

dchaley commented 3 months ago

512x512, completed in 38s of runtime. Creation --> completion was ~8min. Successfully wrote to BigQuery too 🎉 cc @WeihaoGe1009

This means we can work on running an arbitrary file in the container, and handle output. (Right now, the python script hardcodes its input & params. In the notebook we'd just change it inline.)

dchaley commented 3 months ago

Closing this issue as we've completed benchmarks on cpu + gpu!