CellProfiler / python-bioformats

Read and write life sciences file formats
Other
129 stars 45 forks source link

Import 2D+t time series #113

Open dschetel opened 5 years ago

dschetel commented 5 years ago

Hey, I am trying to import a .ics / .ids file, which has 2 spatial and a temporal dimension (2d + t). Right now I am doing this by iterating through every point in time using a for loop img = bioformats.load_image(data, t=x)

This is incredibly slow - is there another way to import 2D timeseries data?

Thank you!

from matplotlib import pyplot as plt
import javabridge
import bioformats
import numpy as np

def importICS(data):
    # Start the Java virtual machine
    javabridge.start_vm(class_path=bioformats.JARS)

    # Import metadata
    omexmlstr = bioformats.get_omexml_metadata(path=data)
    o=bioformats.OMEXML(omexmlstr)
    pixels=o.image().Pixels

    # Initialize array
    result_array = np.empty([pixels.SizeX, pixels.SizeY, 0])

    # Import data
    for x in range(0, pixels.SizeT):
        print(x)
        img = bioformats.load_image(data, t=x)
        result_array = np.dstack((result_array, img))

    # Terminate the Java virtual machine
    javabridge.kill_vm()
    return result_array
AetherUnbound commented 5 years ago

Hello @dschetel! Unfortunately, I don't believe that is possible with the current implementation. One way you could potentially speed up the reading of the file is by splitting the processing into two parts:

  1. Convert the .ics/.ids file to ometiff
  2. Perform your processing on the ometiff

This will definitely front-load your processing with the conversion step, but will allow you to read in the N-dimensional array from the ometiff directly rather than having to iterate through each time slice.

dschetel commented 3 years ago

Hey, is there any way to parallelize the import-process? I tried using joblib like so:

    # Start the Java virtual machine
    javabridge.start_vm(class_path=bioformats.JARS, run_headless=True)

    # Import metadata
    omexmlstr = bioformats.get_omexml_metadata(path=data)
    o=bioformats.OMEXML(omexmlstr)
    pixels=o.image().Pixels

    noCPU = multiprocessing.cpu_count() - 1

    # Import data
    sub_arrays = Parallel(n_jobs=noCPU)(delayed(load_image)(data, t=i) for i in tqdm(range(0,pixels.SizeT)))

    parallel_results = np.stack(sub_arrays, axis=0)

but, I get this error message. Is there a way to get around that?

Thank you!

Traceback (most recent call last):
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/parallel.py", line 253, in __call__
    for func, args, kwargs in self.items]
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/parallel.py", line 253, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/bioformats/formatreader.py", line 1004, in load_using_bioformats
    with ImageReader(path=path) as rdr:
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/bioformats/formatreader.py", line 626, in __init__
    self.path)
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/javabridge/jutil.py", line 1717, in make_instance
    klass = get_env().find_class(class_name)
AttributeError: 'NoneType' object has no attribute 'find_class'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dgs/miniconda3/envs/biochem/bin/cellProcessing", line 8, in <module>
    sys.exit(main())
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/cellProcessing/cli.py", line 98, in main
    data = import_file(datafile)
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/cellProcessing/importexport.py", line 58, in import_file
    image = import_ics(filename)
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/cellProcessing/importexport.py", line 40, in import_ics
    sub_arrays = Parallel(n_jobs=noCPU)(delayed(load_image)(data, t=i) for i in tqdm(range(0,pixels.SizeT)))
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/parallel.py", line 1042, in __call__
    self.retrieve()
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/home/dgs/miniconda3/envs/biochem/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
AttributeError: 'NoneType' object has no attribute 'find_class'