BodenmillerGroup / ImcSegmentationPipeline

A pixel classification based multiplexed image segmentation pipeline
https://bodenmillergroup.github.io/ImcSegmentationPipeline/
MIT License
82 stars 35 forks source link

Corrupt MCD #94

Closed Laner closed 1 year ago

Laner commented 2 years ago

Hi. When script tries to convert mcd files to ome.tiff it states that the MCD file is corrupted and that the start of the XML document is not found. Is there a way around this or away to process it anyway or am I forced to process it again in Hyperion to create new files?

jwindhager commented 2 years ago

Hi @Laner, thanks for reaching out. Unfortunately, the Fluidigm software sometimes produces corrupted MCD files. At the moment, there is no easy way of recovering acquisition data from MCD files broken in the way you are describing using the IMC Segmentation Pipeline.

You can, however, try to extract the images from the TXT files instead, either using steinbock, or manually using readimc and a bit of Python code (e.g. in a new code cell in imc_preprocessing.ipynb):

from readimc import TXTFile
from tifffile import imwrite

with TXTFile("/path/to/file.txt") as f:
    img = f.read_acquisition()

imwrite("my_acquisition.tiff", data=img)

Note, however, that the metadata format of the extracted TIFF files will be different from the format used by the IMC Segmentation Pipeline. This should not interfere with anything downstream, though (the pixel data/data type will be the same). But, for use with the IMC Segmentation Pipeline, you will have to manually create the channel order CSV.

Laner commented 2 years ago

Thanks for your quick reply and clear answer @jwindhager For the last few days I have been trying to make the code you supplied into a loop so that I can batch process all .txt files in a directory I specify and outputting it to another directory but preserving the name og the file. But I am coming up short.

I understand if this falls out side of the scope of issue tickets, but would you be able to help me with that?

jwindhager commented 2 years ago

Sure! Didn't test it, but should be something like:

from pathlib import Path

from readimc import TXTFile
from tifffile import imwrite

txt_dir = "/path/to/txts"
img_dir = "/path/to/imgs"

for txt_file in Path(txt_dir).glob("*.txt"):
    with TXTFile(txt_file) as f:
        img = f.read_acquisition()
    img_file = Path(img_dir) / f"{txt_file.stem}.tiff"
    imwrite(img_file, data=img)

Alternatively, use steinbock :-)

steinbock preprocess imc images
Laner commented 2 years ago

Thanks a lot @jwindhager As Steinbock won't run in Docker on my Mac, I will try your script tomorrow.

jwindhager commented 2 years ago

Closing for now, please reopen if needed.

Laner commented 1 year ago

Quick update (even though it is closed) Months later we managed to run the pipeline without errors. Turned out the MCD was not corrupted, but contained hidden files from the Mac OS. Followed this post https://apple.stackexchange.com/questions/239578/compress-without-ds-store-and-macosx and now it runs perfectly.

nilseling commented 1 year ago

@jwindhager how are hidden files handled in readimc and then in steinbock? Would this still be an issue?

jwindhager commented 1 year ago

readimc does not list files, but operates with direct paths to .txt/.mcd files only --> no issue

steinbock excludes hidden files as well as (visible/hidden) files starting with a dot --> no issue

The IMC Segmentation Pipeline does not exclude files starting with a dot --> I'll submit a PR