DaisyRequestBlocks fails to create a daisy context

For making large-scale predictions, I want to swap a Scan node for the DaisyRequestBlocks node, which as far as I understand should avoid Scan's behaviour of filling up memory with the composite Batch that gets assembled from Scan's requests.

Here's the error I get:

ERROR:daisy.context:DAISY_CONTEXT environment variable not found!
Process Process-2:
Traceback (most recent call last):
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bengallusser/code/src/gunpowder/gunpowder/nodes/daisy_request_blocks.py", line 98, in __get_chunks
    daisy_client = daisy.Client()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/site-packages/daisy/client.py", line 66, in __init__
    self.context = Context.from_env()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/site-packages/daisy/context.py", line 30, in from_env
    tokens = os.environ['DAISY_CONTEXT'].split(':')
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'DAISY_CONTEXT'

DaisyRequestBlocks does not pass a context to the daisy client. Do I have to to specify the DAISY_CONTEXT manually?

I'm using gunpowder=1.1.5 and tried combining with both daisy-0.2.1 available on PyPI and the daisy branch 0.3-dev, I get the same error.

The DaisyRequestBlocks node is not sufficient to turn your gunpowder pipeline into a daisy task. You must use daisy as you would without gunpowder, and then you can use the gunpowder node to replace the usual daisy while True: get_block() loop.

Scan is usually followed by more processing since the result is assumed to be small enough to still fit in memory, even if your upstream processing must be handled in smaller chunks. You can't do that with DaisyRequestBlocks. This node should go at the very end of your pipeline and won't return anything. Any outputs you want to keep should be written to permanent storage, i.e. Zarr, MongoDB, etc.

Not sure where to point you to for a minimal working example so here is a somewhat minimal example of what I do: I'm pretty sure the daisy.call function handles setting the DAISY_CONTEXT, but someone else might want to jump in with more of the daisy specifics

File: daisy_predict.py

import daisy

def predict_blockwise(input_roi, block_read_roi, block_write_roi, num_workers):

    # process block-wise
    succeeded = daisy.run_blockwise(
        input_roi,
        block_read_roi,
        block_write_roi,
        process_function=lambda: predict_worker(),
        check_function=lambda b: check_block(),
        num_workers=num_workers,
        read_write_conflict=False,
        fit='valid')

def predict_worker():

    # bsub command for starting a job on the cluster
    command = [
        'bsub',
        '-c', '1',
        '-g', '1',
    ]

    # python script containing your predict script
    command += ['python predict.py']

    daisy.call(command, log_out=log_out, log_err=log_err)

if __name__ == "__main__":

    predict_blockwise(daisy.Roi((0,0,0),(10,10,10)), daisy.Roi((0,0,0),(5,5,5)), daisy.Roi((0,0,0),(3,3,3)), 5)

File: predict.py

import gunpowder as gp

def predict_in_block(block_read_roi, block_write_roi, zarr_directory, out_dataset):
    raw = gp.ArrayKey("RAW")
    output = gp.ArrayKey("OUTPUT")

    chunk_request = gp.BatchRequest()
    chunk_request[raw] = gp.ArraySpec(roi=block_write_roi)
    chunk_request[output] = gp.ArraySpec(roi=block_read_roi)

    pipeline = (
    ...
    + gp.ZarrWrite(
            dataset_names={output: out_dataset},
            output_filename=zarr_directory,
        )
    + gp.DaisyRequestBlocks(
            chunk_request,
            roi_map={raw: "read_roi", output: "write_roi"},
            num_workers=num_workers,
            block_done_callback=lambda b, s, d: block_done_callback(),
        )
    )

    print("Starting prediction...")
    with build(pipeline):
        pipeline.request_batch(BatchRequest())
    print("Prediction finished")

if __name__ == "__main__":
    predict_in_block(gp.Roi((0,0,0),(5,5,5)), gp.Roi((0,0,0),(3,3,3)), "output.zarr", "volumes/output")

funkelab / gunpowder

DaisyRequestBlocks fails to create a daisy context #105