Closed bentaculum closed 3 years ago
The DaisyRequestBlocks
node is not sufficient to turn your gunpowder pipeline into a daisy task. You must use daisy as you would without gunpowder, and then you can use the gunpowder node to replace the usual daisy while True: get_block()
loop.
Scan
is usually followed by more processing since the result is assumed to be small enough to still fit in memory, even if your upstream processing must be handled in smaller chunks. You can't do that with DaisyRequestBlocks
. This node should go at the very end of your pipeline and won't return anything. Any outputs you want to keep should be written to permanent storage, i.e. Zarr, MongoDB, etc.
Not sure where to point you to for a minimal working example so here is a somewhat minimal example of what I do:
I'm pretty sure the daisy.call
function handles setting the DAISY_CONTEXT
, but someone else might want to jump in with more of the daisy specifics
File: daisy_predict.py
import daisy
def predict_blockwise(input_roi, block_read_roi, block_write_roi, num_workers):
# process block-wise
succeeded = daisy.run_blockwise(
input_roi,
block_read_roi,
block_write_roi,
process_function=lambda: predict_worker(),
check_function=lambda b: check_block(),
num_workers=num_workers,
read_write_conflict=False,
fit='valid')
def predict_worker():
# bsub command for starting a job on the cluster
command = [
'bsub',
'-c', '1',
'-g', '1',
]
# python script containing your predict script
command += ['python predict.py']
daisy.call(command, log_out=log_out, log_err=log_err)
if __name__ == "__main__":
predict_blockwise(daisy.Roi((0,0,0),(10,10,10)), daisy.Roi((0,0,0),(5,5,5)), daisy.Roi((0,0,0),(3,3,3)), 5)
File: predict.py
import gunpowder as gp
def predict_in_block(block_read_roi, block_write_roi, zarr_directory, out_dataset):
raw = gp.ArrayKey("RAW")
output = gp.ArrayKey("OUTPUT")
chunk_request = gp.BatchRequest()
chunk_request[raw] = gp.ArraySpec(roi=block_write_roi)
chunk_request[output] = gp.ArraySpec(roi=block_read_roi)
pipeline = (
...
+ gp.ZarrWrite(
dataset_names={output: out_dataset},
output_filename=zarr_directory,
)
+ gp.DaisyRequestBlocks(
chunk_request,
roi_map={raw: "read_roi", output: "write_roi"},
num_workers=num_workers,
block_done_callback=lambda b, s, d: block_done_callback(),
)
)
print("Starting prediction...")
with build(pipeline):
pipeline.request_batch(BatchRequest())
print("Prediction finished")
if __name__ == "__main__":
predict_in_block(gp.Roi((0,0,0),(5,5,5)), gp.Roi((0,0,0),(3,3,3)), "output.zarr", "volumes/output")
Thanks for this great mini-tutorial :)
For making large-scale predictions, I want to swap a
Scan
node for theDaisyRequestBlocks
node, which as far as I understand should avoidScan
's behaviour of filling up memory with the compositeBatch
that gets assembled fromScan
's requests.Here's the error I get:
DaisyRequestBlocks
does not pass acontext
to the daisy client. Do I have to to specify theDAISY_CONTEXT
manually?I'm using
gunpowder=1.1.5
and tried combining with bothdaisy-0.2.1
available on PyPI and thedaisy
branch0.3-dev
, I get the same error.