Euro-BioImaging / BatchConvert

A nextflow based tool that wraps bfconvert and bioformats2raw to convert image data collections to OME-TIFF and OME-Zarr, respectively, in a parallelised manner.
MIT License
16 stars 2 forks source link

ValueError: conflicting sizes for dimension 'C' after converting one channel into ome.tiff image #24

Open DorianKauffmann opened 8 months ago

DorianKauffmann commented 8 months ago

Hello, Thanks for your work on BatchConvert, which is very useful and practical!

For my work in France-BioImaging, I need to convert several multichannel images into single-channel ome.tiff images - for each channel of the original images. So I created this python code using your documentation; to convert here only the first channel of an example image).

img_omx = AICSImage( input_path/image_name.dv)
print(img_omx.dims)

<Dimensions [T: 1, C: 3, Z: 21, Y: 512, X: 512]>

command = ['batchconvert',
            'ometiff', '-pf', 'conda', 
            '-chn', str(0),
            input_path,
            output_path,
            ]    
try:
    subprocess.run(command, check=True)
except Exception as e:
    print(f"Error converting {input_folder_path}: {e}") 

But after running this code and converting my 3-channel images I got a Value error about channel sizes (idem with some imread)

img = AICSImage(ometiff_converted_demo)
print(img.dims)
ValueError: conflicting sizes for dimension 'C': length 1 on the data but length 3 on coordinate 'C' 

The full error message below:

```json ValueError Traceback (most recent call last) /home/mypath Cell 6 line 6 4 img = AICSImage(ometiff_converted_demo) 5 # print(img.metadata) ----> 6 print(img.dims) File ~/mambaforge/envs/environment/lib/python3.10/site-packages/aicsimageio/aics_image.py:574, in AICSImage.dims(self) 566 \"\"\" 567 Returns 568 ------- 569 dims: dimensions.Dimensions 570 Object with the paired dimension names and their sizes. 571 \"\"\" 572 if self._dims is None: 573 self._dims = dimensions.Dimensions( --> 574 dims=self.xarray_dask_data.dims, shape=self.shape 575 ) 577 return self._dims File ~/mambaforge/envs/environment/lib/python3.10/site-packages/aicsimageio/aics_image.py:438, in AICSImage.xarray_dask_data(self) 423 \"\"\" 424 Returns 425 ------- (...) 431 If the image contains mosaic tiles, data is returned already stitched together. 432 \"\"\" 433 if self._xarray_dask_data is None: 434 if ( 435 # Does the user want to get stitched mosaic 436 self._reconstruct_mosaic 437 # Does the data have a tile dim --> 438 and dimensions.DimensionNames.MosaicTile in self.reader.dims.order 439 ): 440 try: 441 self._xarray_dask_data = ( 442 self._transform_data_array_to_aics_image_standard( 443 self.reader.mosaic_xarray_dask_data 444 ) 445 ) File ~/mambaforge/envs/environment/lib/python3.10/site-packages/aicsimageio/readers/reader.py:532, in Reader.dims(self) 525 \"\"\" 526 Returns 527 ------- 528 dims: Dimensions 529 Object with the paired dimension names and their sizes. 530 \"\"\" 531 if self._dims is None: --> 532 self._dims = Dimensions(dims=self.xarray_dask_data.dims, shape=self.shape) 534 return self._dims File ~/mambaforge/envs/environment/lib/python3.10/site-packages/aicsimageio/readers/reader.py:359, in Reader.xarray_dask_data(self) 352 \"\"\" 353 Returns 354 ------- 355 xarray_dask_data: xr.DataArray 356 The delayed image and metadata as an annotated data array. 357 \"\"\" 358 if self._xarray_dask_data is None: --> 359 self._xarray_dask_data = self._read_delayed() 361 return self._xarray_dask_data File ~/mambaforge/envs/environment/lib/python3.10/site-packages/aicsimageio/readers/ome_tiff_reader.py:334, in OmeTiffReader._read_delayed(self) 331 # Create the delayed dask array 332 image_data = self._create_dask_array(tiff, strictly_read_dims) --> 334 return self._general_data_array_constructor( 335 image_data, 336 dims, 337 coords, 338 tiff_tags, 339 ) File ~/mambaforge/envs/environment/lib/python3.10/site-packages/aicsimageio/readers/ome_tiff_reader.py:286, in OmeTiffReader._general_data_array_constructor(self, image_data, dims, coords, tiff_tags) 283 # Reset dims after transform 284 dims = [d for d in out_order] --> 286 return xr.DataArray( 287 image_data, 288 dims=dims, 289 coords=coords, 290 attrs={ 291 constants.METADATA_UNPROCESSED: tiff_tags, 292 constants.METADATA_PROCESSED: self._ome, 293 }, 294 ) File ~/mambaforge/envs/environment/lib/python3.10/site-packages/xarray/core/dataarray.py:418, in DataArray.__init__(self, data, coords, dims, name, attrs, indexes, fastpath) 416 data = _check_data_shape(data, coords, dims) 417 data = as_compatible_data(data) --> 418 coords, dims = _infer_coords_and_dims(data.shape, coords, dims) 419 variable = Variable(dims, data, attrs, fastpath=True) 420 indexes, coords = _create_indexes_from_coords(coords) File ~/mambaforge/envs/environment/lib/python3.10/site-packages/xarray/core/dataarray.py:163, in _infer_coords_and_dims(shape, coords, dims) 161 for d, s in zip(v.dims, v.shape): 162 if s != sizes[d]: --> 163 raise ValueError( 164 f\"conflicting sizes for dimension {d!r}: \" 165 f\"length {sizes[d]} on the data but length {s} on \" 166 f\"coordinate {k!r}\" 167 ) 169 if k in sizes and v.shape != (sizes[k],): 170 raise ValueError( 171 f\"coordinate {k!r} is a DataArray dimension, but \" 172 f\"it has shape {v.shape!r} rather than expected shape {sizes[k]!r} \" 173 \"matching the dimension size\" 174 ) ValueError: conflicting sizes for dimension 'C': length 1 on the data but length 3 on coordinate 'C'" ```

Suprisingly, when using the BioformatReader, I can have access to the new monocanal image (but it takes much more time to open it).

img = AICSImage(ometiff_converted_demo, reader=BioformatsReader)
print(img.dims)

<Dimensions [T: 1, C: 1, Z: 21, Y: 512, X: 512]>

Then I checked the ome-type xml metadata file and saw that the sizeC is indeed 1 but, in Pixels, the information for the 3 channels remains instead of just the first. It therefore seems that there was an error at this level during the conversion.

<?xml version="1.0" encoding="UTF-8"?> ...
<OME ...
    <Instrument ID="Instrument:0"> ...
    </Instrument>
    <Image 
        ...
        <Pixels BigEndian="false" DimensionOrder="XYCZT" ID="Pixels:0" Interleaved="false" PhysicalSizeX="0.07999999821186066" PhysicalSizeXUnit="µm" PhysicalSizeY="0.07999999821186066" PhysicalSizeYUnit="µm" PhysicalSizeZ="1.0" PhysicalSizeZUnit="µm" SignificantBits="16" 
SizeC="1" SizeT="1" SizeX="512" SizeY="512" SizeZ="21"Type="uint16">
            <Channel EmissionWavelength="683.0" EmissionWavelengthUnit="nm" ExcitationWavelength="642.0" ExcitationWavelengthUnit="nm" ID="Channel:0:0" NDFilter="0.10000000149011612" SamplesPerPixel="1">
                <LightPath/>
            </Channel>
            <Channel EmissionWavelength="528.0" EmissionWavelengthUnit="nm" ID="Channel:0:1" NDFilter="0.25" SamplesPerPixel="1">
                <LightPath/>
            </Channel>
            <Channel EmissionWavelength="435.0" EmissionWavelengthUnit="nm" ExcitationWavelength="405.0" ExcitationWavelengthUnit="nm" ID="Channel:0:2" NDFilter="0.10000000149011612" SamplesPerPixel="1">
                <LightPath/>
            </Channel> 
            ...
        </Pixels>
    </Image>
...

It therefore seems that there was an error at this level during the conversion.

The full xml file (as .txt) here : Info_HeLa_Mitotrack-647_BrField_Hoechst_sample-1_001_visit_1.txt

As I can't upload .dv and ome.tiff images on GIthub, don't hesitate to contact me if you want the input and output image.

Thank you for all, Dorian

bugraoezdemir commented 8 months ago

Hi @DorianKauffmann ,

Thank you for your positive feedback on BatchConvert and the detailed description of the issue. I can replicate the behaviour using several example images in different formats. The channel information of the original dataset is retained in the metadata of the extracted series.

It appears that the root cause lies within the Bio-Formats tools that are wrapped by BatchConvert for conversion. I will investigate further to clarify if this behaviour of Bio-Formats is intentional or a potential bug. In the meanwhile I will also try to figure out a temporary fix in BatchConvert.

Cheers, Bugra

DorianKauffmann commented 8 months ago

Hello Bugra, So great, thank you very much !

Best, Dorian

Le mer. 10 janv. 2024 à 21:23, bugraoezdemir @.***> a écrit :

Hi @DorianKauffmann https://github.com/DorianKauffmann ,

Thank you for your positive feedback on BatchConvert and the detailed description of the issue. I can replicate the behaviour using several example images in different formats. The channel information of the original dataset is retained in the metadata of the extracted series.

It appears that the root cause lies within the Bio-Formats tools that are wrapped by BatchConvert for conversion. I will investigate further to clarify if this behaviour of Bio-Formats is intentional or a potential bug. In the meanwhile I will also try to figure out a temporary fix in BatchConvert.

Cheers, Bugra

— Reply to this email directly, view it on GitHub https://github.com/Euro-BioImaging/BatchConvert/issues/24#issuecomment-1885660648, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUGQ5D5YTLXSEHQAEQ3MXT3YN32E5AVCNFSM6AAAAABBTQLVRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBVGY3DANRUHA . You are receiving this because you were mentioned.Message ID: @.***>