CellProfiler / python-bioformats

Read and write life sciences file formats
Other
132 stars 46 forks source link

Reading metadata #23

Open christofferaberg opened 9 years ago

christofferaberg commented 9 years ago

First I should commend the creation of this package. I had just about given up (at least temporarily) on using bioformats due to the lack of python support, when I realised this package had been released. Thanks for that!

I do, however, have some issues with actually using the package. I have tried files from two instruments, and in all cases I am unable to read metadata from individual planes. Note that this metadata is readable by the BioFormats library itself, because I can see it using the BioFormats Import plugin in ImageJ.

To take an easy example, consider one of the bioformats sample files, say:

http://www.openmicroscopy.org/Schemas/Samples/2015-01/bioformats-artificial/multi-channel-time-series.ome.tif.zip

This file contains the following metadata

<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2013-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2013-06 http://www.openmicroscopy.org/Schemas/OME/2013-06/ome.xsd">
<Image ID="Image:0" Name="multi-channel-time-series">
<AcquisitionDate>2015-02-11T12:01:03</AcquisitionDate>
<Pixels BigEndian="true" DimensionOrder="XYCZT" ID="Pixels:0" Interleaved="false" SignificantBits="8" SizeC="3" SizeT="7" SizeX="439" SizeY="167" SizeZ="1" Type="int8">
<Channel ID="Channel:0:0" SamplesPerPixel="1">
<LightPath/>
<Channel ID="Channel:0:1" SamplesPerPixel="1">
<LightPath/>
<Channel ID="Channel:0:2" SamplesPerPixel="1">
<LightPath/>
<TiffData FirstC="0" FirstT="0" FirstZ="0" IFD="0" PlaneCount="1">
<UUID FileName="multi-channel-time-series.ome.tif">urn:uuid:df28ea90-2117-416d-b839-5a02a648bf1f</UUID>
<TiffData FirstC="1" FirstT="0" FirstZ="0" IFD="1" PlaneCount="1">
<UUID FileName="multi-channel-time-series.ome.tif">urn:uuid:df28ea90-2117-416d-b839-5a02a648bf1f</UUID>
[snip]

Based on the documentation, I expected that using the code below, I would be able to traverse this tree of information:

import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)
omexmlstr=bioformats.get_omexml_metadata('multi-channel-time-series.ome.tif')
o=bioformats.OMEXML(omexmlstr)
pixels=o.image().Pixels
pixels.get_channel_count()
>>> 1
pixels.get_plane_count()
>>> 0
pixels.SizeT
>>> 21

As you can see, the code does not give the correct number of channels, nor the correct number of planes or SizeT.

Now, this example is perhaps somewhat artificial (the XML string that is extracted is correct, so I could in principle traverse its tree manually). However, for all other cases I have tried, I have been unable to read any of the information of the individual planes, and get_plane_count() always returns 0. Am I, somehow, using the package incorrectly?

LeeKamentsky commented 9 years ago

I think the problem is that bioformats is using the wrong reader. It should be using loci.formats.in.OMETiffReader and instead is using loci.formats.in.TiffDelegateReader. This is because bioformats, by default, prevents the ImageReader class from examining the file contents if it can find a reader by looking at the file's extension. The OMETiffReader insists on reading the file before accepting that it is the file's reader, but the TiffDelegateReader does not, so the TiffDelegateReader is chosen as the reader.

We do this to limit the time it takes to choose a reader and to limit the impact of (potentially) 100+ readers each reading the file from disk. This can have a heavy impact for our parent project, CellProfiler, if run on a cluster (We've brought down an Isolon file server before).

You can use the loci.formats.ImageReader class to pick the image reader and override this behavior:

import javabridge
import bioformats.formatreader
javabridge.start_vm(class_path=bioformats.JARS)
from pylab import *
rcParams['figure.figsize'] = (63, 27)
path = r"c:\Temp\output\bfissue23\multi-channel-time-series.ome.tif"
image_reader = bioformats.formatreader.make_image_reader_class()()
# This sets up the readers to read the contents of the file if necessary
image_reader.allowOpenToCheckType(True)
image_reader.setId(path)
n_c = image_reader.getSizeC()
n_t = image_reader.getSizeT()
# Make a fake ImageReader and install the one above inside it
wrapper = bioformats.formatreader.ImageReader(path=path, perform_init=False)
wrapper.rdr = image_reader
print "This file has %d channels and %d timepoints" % (n_c, n_t)
for c in range(n_c):
   for t in range(n_t):
        subplot(n_c, n_t, c*n_t + t + 1).imshow(wrapper.read(c=c, t=t), cmap="gray")
savefig(r"c:\Temp\output\bfissue23\figure.png")

figure

We've improved the Javabridge considerably since writing python-bioformats. You might consider using the Java directly to access the metadata:

rdr = javabridge.JClassWrapper('loci.formats.in.OMETiffReader')()
rdr.setOriginalMetadataPopulated(True)
clsOMEXMLService = javabridge.JClassWrapper('loci.formats.services.OMEXMLService')
serviceFactory = javabridge.JClassWrapper('loci.common.services.ServiceFactory')()
service = serviceFactory.getInstance(clsOMEXMLService.klass)
metadata = service.createOMEXMLMetadata()
rdr.setMetadataStore(metadata)
rdr.setId(path)
root = metadata.getRoot()
first_image = root.getImage(0)
pixels = first_image.getPixels()
# The plane data isn't in the planes, it's in the tiff data
for idx in range(pixels.sizeOfTiffDataList()):
    tiffData = pixels.getTiffData(idx)
    c = tiffData.getFirstC().getValue().intValue()
    t = tiffData.getFirstT().getValue().intValue()
    print "TiffData: c=%d, t=%d" % (c, t)
christofferaberg commented 9 years ago

Thanks for your prompt reply!

However, I am not sure of how that solves the problem. Perhaps I need to re-emphasize that what I am interested in is reading metadata from microscopy image files using the bioformats library, using python. The problem with SizeT was an example, as was the fact that I used a sample file from BioFormats.

So, to put my question in a wider perspective, is python-bioformats the right tool to use for reading metadata from microscopy images files in a bioformats-type-of-way?

With regards to the specific solutions you suggest (thanks again!), the first one (using python-bioformats) works in the sense that the number of channels and time frames are correct. However, I am unsure of how to get at all the other metadata, in particular the ones associated with the individual planes.

The second one (using javabridge) fails at line 949 in jutil.pyc with the error message: AttributeError: 'NoneType' object has no attribute 'find_class'

LeeKamentsky commented 9 years ago

"Is python-bioformats the right tool to use for reading metadata from microscopy images files in a bioformats-type-of-way?" In some ways, python-bioformats is a work in progress and in others, it's tailored to the CellProfiler use case without too many hooks to access the configuration options that would give you richer metadata but at a computational expense. I'd welcome a commit that added that flexibility. The code you use might also be dependent on the file type and its internal structure. Do you have an example file you could send to me? What microscope are you using?

Perhaps, though, you could get the farthest by accessing the Bio-formats jar (loci_tools or bioformats_package) directly through the javabridge. I think the script is failing because I neglected to add code to start the VM (see corrected below). You should be able to use the javabridge to do anything you might do in Java using the Java Bio-formats classes directly.

import javabridge
import bioformats
path = r"c:\Temp\output\bfissue23\multi-channel-time-series.ome.tif"
javabridge.start_vm(class_path=bioformats.JARS)
rdr = javabridge.JClassWrapper('loci.formats.in.OMETiffReader')()
rdr.setOriginalMetadataPopulated(True)
clsOMEXMLService = javabridge.JClassWrapper('loci.formats.services.OMEXMLService')
serviceFactory = javabridge.JClassWrapper('loci.common.services.ServiceFactory')()
service = serviceFactory.getInstance(clsOMEXMLService.klass)
metadata = service.createOMEXMLMetadata()
rdr.setMetadataStore(metadata)
rdr.setId(path)
root = metadata.getRoot()
first_image = root.getImage(0)
pixels = first_image.getPixels()
# The plane data isn't in the planes, it's in the tiff data
for idx in range(pixels.sizeOfTiffDataList()):
    tiffData = pixels.getTiffData(idx)
    c = tiffData.getFirstC().getValue().intValue()
    t = tiffData.getFirstT().getValue().intValue()
    print "TiffData: c=%d, t=%d" % (c, t)
Heerpa commented 7 years ago

I had the same issue with ome tiffs, and the solution works awesomely for me. However, I do not only want to read metadata but also the actual image data. Here, obviously, I get the same issue when using the 'standard' reader via bioformats, and image dimensions do not match. Therefore I would like to go the javabridge-only-route [should this post be transferred to javabridge, then?]. Unfortunately I am not quite successful here, though. I have been trying to use a concoction of the above solution and the example in https://github.com/CellProfiler/python-bioformats/blob/master/bioformats/formatreader.py

My code then looks like this:

    path = 'path/to/image/file.ome.tif'
    sizeSTZCYX = np.array([1,2,3,4,5,6])

    rdr = javabridge.JClassWrapper('loci.formats.in.OMETiffReader')()
    rdr.setOriginalMetadataPopulated(True)
    clsOMEXMLService = javabridge.JClassWrapper(
                    'loci.formats.services.OMEXMLService')
    serviceFactory = javabridge.JClassWrapper(
                    'loci.common.services.ServiceFactory')()
    service = serviceFactory.getInstance(clsOMEXMLService.klass)
    metadata = service.createOMEXMLMetadata()
    rdr.setMetadataStore(metadata)
    rdr.setId(path)

    ChannelSeparator = javabridge.JClassWrapper(
        'loci.formats.ChannelSeparator')()
    cs = ChannelSeparator(rdr)

    img = np.zeros(sizeSTZCYX, dtype=np.float64)
    try:
        for sidx in range(sizeSTZCYX[0]):
            rdr.setSeries(sidx)
            for tidx in range(sizeSTZCYX[1]):
                for zidx in range(sizeSTZCYX[2]):
                    for cidx in range(sizeSTZCYX[3]):
                        img[sidx, tidx, zidx, cidx, :, :] = cs.openBytes(
                                cs.getIndex(zidx, cidx, tidx))

This throws an Error: TypeError: 'JWrapper' object is not callable

Which makes total sense, but I have too little insight into the inner workings of the javabridge to get any further. Any help would be highly appreciated.

LeeKamentsky commented 7 years ago

Hi Heerpa, Are you getting the exception in service = serviceFactory.getInstance(clsOMEXMLService.klass)? Offhand, you might have better luck with this:

clsOMEXMLService = javabridge.JClassWrapper('java.lang.Class') \
   .forName('loci.formats.services.OMEXMLService')
serviceFactory.getInstance(clsOMEXMLService)

I didn't check it, should work.

Heerpa commented 7 years ago

Hi Lee, oh sorry, no I get the exception in cs = ChannelSeparator(rdr) . Apparently I can't call the javabridge wrapper with the reader.

The previous part about clsOMEXMLService works fine. Replacing it with your suggestion throws javabridge.jutil.JavaException: loci/formats/services/OMEXMLService in line 918 of javabridge/jutil.py

LeeKamentsky commented 7 years ago

Oh piece of cake:

cs = javabridge.JClassWrapper('loci.formats.ChannelSeparator')(rdr)

Does that work for you?

Heerpa commented 7 years ago

That works perfectly. Now I have an issue with broadcasting the byte array into a 2D uint16 array, but that should be googlable. Thanks very much for the immediate rescue!