funkelab / gunpowder

A library to facilitate machine learning on multi-dimensional images.
https://funkelab.github.io/gunpowder/
MIT License
78 stars 56 forks source link

DaisyRequestBlocks fails to create a daisy context #105

Closed bentaculum closed 3 years ago

bentaculum commented 4 years ago

For making large-scale predictions, I want to swap a Scan node for the DaisyRequestBlocks node, which as far as I understand should avoid Scan's behaviour of filling up memory with the composite Batch that gets assembled from Scan's requests.

Here's the error I get:

ERROR:daisy.context:DAISY_CONTEXT environment variable not found!
Process Process-2:
Traceback (most recent call last):
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bengallusser/code/src/gunpowder/gunpowder/nodes/daisy_request_blocks.py", line 98, in __get_chunks
    daisy_client = daisy.Client()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/site-packages/daisy/client.py", line 66, in __init__
    self.context = Context.from_env()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/site-packages/daisy/context.py", line 30, in from_env
    tokens = os.environ['DAISY_CONTEXT'].split(':')
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'DAISY_CONTEXT'

DaisyRequestBlocks does not pass a context to the daisy client. Do I have to to specify the DAISY_CONTEXT manually?

I'm using gunpowder=1.1.5 and tried combining with both daisy-0.2.1 available on PyPI and the daisy branch 0.3-dev, I get the same error.

pattonw commented 4 years ago

The DaisyRequestBlocks node is not sufficient to turn your gunpowder pipeline into a daisy task. You must use daisy as you would without gunpowder, and then you can use the gunpowder node to replace the usual daisy while True: get_block() loop.

Scan is usually followed by more processing since the result is assumed to be small enough to still fit in memory, even if your upstream processing must be handled in smaller chunks. You can't do that with DaisyRequestBlocks. This node should go at the very end of your pipeline and won't return anything. Any outputs you want to keep should be written to permanent storage, i.e. Zarr, MongoDB, etc.

Not sure where to point you to for a minimal working example so here is a somewhat minimal example of what I do: I'm pretty sure the daisy.call function handles setting the DAISY_CONTEXT, but someone else might want to jump in with more of the daisy specifics

File: daisy_predict.py

import daisy

def predict_blockwise(input_roi, block_read_roi, block_write_roi, num_workers):

    # process block-wise
    succeeded = daisy.run_blockwise(
        input_roi,
        block_read_roi,
        block_write_roi,
        process_function=lambda: predict_worker(),
        check_function=lambda b: check_block(),
        num_workers=num_workers,
        read_write_conflict=False,
        fit='valid')

def predict_worker():

    # bsub command for starting a job on the cluster
    command = [
        'bsub',
        '-c', '1',
        '-g', '1',
    ]

    # python script containing your predict script
    command += ['python predict.py']

    daisy.call(command, log_out=log_out, log_err=log_err)

if __name__ == "__main__":

    predict_blockwise(daisy.Roi((0,0,0),(10,10,10)), daisy.Roi((0,0,0),(5,5,5)), daisy.Roi((0,0,0),(3,3,3)), 5)

File: predict.py

import gunpowder as gp

def predict_in_block(block_read_roi, block_write_roi, zarr_directory, out_dataset):
    raw = gp.ArrayKey("RAW")
    output = gp.ArrayKey("OUTPUT")

    chunk_request = gp.BatchRequest()
    chunk_request[raw] = gp.ArraySpec(roi=block_write_roi)
    chunk_request[output] = gp.ArraySpec(roi=block_read_roi)

    pipeline = (
    ...
    + gp.ZarrWrite(
            dataset_names={output: out_dataset},
            output_filename=zarr_directory,
        )
    + gp.DaisyRequestBlocks(
            chunk_request,
            roi_map={raw: "read_roi", output: "write_roi"},
            num_workers=num_workers,
            block_done_callback=lambda b, s, d: block_done_callback(),
        )
    )

    print("Starting prediction...")
    with build(pipeline):
        pipeline.request_batch(BatchRequest())
    print("Prediction finished")

if __name__ == "__main__":
    predict_in_block(gp.Roi((0,0,0),(5,5,5)), gp.Roi((0,0,0),(3,3,3)), "output.zarr", "volumes/output")
bentaculum commented 4 years ago

Thanks for this great mini-tutorial :)