dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
8 stars 2 forks source link

Create job that processes multiple inputs #292

Closed dchaley closed 1 month ago

dchaley commented 1 month ago

Currently, we process one image at a time in our task specification. In particular the input is a single file url.

Instead, we need to run on a collection of images. In Batch, each task has the same arguments, and the runtime receives $BATCH_TASK_INDEX in the environment. We need to use this task index to determine which file to actually run on.

This means we need to provide the task definition with a list of files. Probably as easy as passing a URI to a list of files, and indexing out our file.

So, each task would do this:

dchaley commented 1 month ago

The entrypoints support a task-list mode now.

Next: given a list of input files, generate the task-list json files, upload to storage, & submit the jobs.