v3 - Githubissues

jwindhager commented 2 years ago

As discussed, this pull request drops the dependency on imctools.

Furthermore, this pull request includes some changes requested by @nilseling.

Organizational changes:

New variable naming conventions/code formatting (black), but keeping the old directory structure/file naming conventions
The pipeline now ships with its integrated Python package, imcsegpipe, which largely covers functionality originally implemented in imctools.converters, but has been implemented from scratch using readimc for reading MCD/TXT files
The new environment.yml takes care of installing Python, Jupyter Lab / Jupytext, and imcsegpipe
Further Python package dependencies are listed in setup.cfg (i.e., installed from PyPI)
Jupytext is enabled for the Jupyter Notebook

Functional changes:

Support for using "backup" schema XML files (instead of the ones embedded in MCD files) has been dropped
The ome folder now contains a .csv file for each image, indicating channel names ("metals") and labels
TXT files not matched to MCD acquisitions are no longer extracted (not sure they ever were, @votti?)
Slide and panorama images are always written as PNG, irrespective of their original format
Lacking imctools.data, no JSON serialization of imctools objects is created

Notes:

Like before, OME-TIFF files are considered "secondary raw data" and created before any hot pixel filtering etc
Similarly, OME-TIFF and HistoCAT files are saved as "native" float32; analysis stacks are saved as uint16
Hot pixel filtering is applied to analysis stacks outside of CellProfiler, as part of create_analysis_stacks

WIP:

Are you OK with these changes, @votti? Anything missing/to be changed?
Could you take care of the following TODOs on this branch/PR, @nilseling?
- [ ] Adapt the CellProfiler pipelines accordingly (e.g., remove the hot pixel filtering step)
- [ ] Carefully review the changes and extensively test the updated pipeline on various datasets
- [ ] Update the markdown instructions in the Jupyter Notebook / the files in documentation
- [ ] Prepare a v3 release, updating the README and the changelog

nilseling commented 2 years ago

This is great, thanks! I'll review it next week.

votti commented 2 years ago

Hey Team resp Jonas,

First look looks great - thanks a lot for your efforts!

On fist sights two points:

1) According to which rationale are image pre-processing steps put into the conversion step vs the CP pre-processing? Eg the hot pixel filtering was put back into the conversion step, but the 2x image scaling not?

2) If we have now the imcseg helper package: Could it make sense to factor out some of the more complicated cells of the notebook into higher level functions in the imcseg package? Could beneficial for testing and reuse.

Cheers!

On Sat, 12 Feb 2022, 21:46 Nils Eling, @.***> wrote:

This is great, thanks! I'll review it next week.

— Reply to this email directly, view it on GitHub https://github.com/BodenmillerGroup/ImcSegmentationPipeline/pull/85#issuecomment-1037466770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4R2KA7ADI3HDBI2FLP6N3U23BJ5ANCNFSM5OHO6O6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

nilseling commented 2 years ago

Hey @votti,

thanks for checking this out! 1) This comes from trying to address #72. I noticed that CellProfiler v4.1.3 and v4.0.7 doesn't seem to write valid multi-channel .tiff files (see 4496) and CellProfiler v4.2.1 doesn't write multi-channel .tiff files at all (see 4495). By moving the hot pixel filtering out of CellProfiler, we would exclude mutli-channel .tiff generation via CP. I would still leave the scaling steps in the pipelines for readability reasons. 2) I find it ok since it's still readable. But maybe @jwindhager has comments?

Cheers, Nils

nilseling commented 2 years ago

@jwindhager I noticed that the slide overview is written out as a _pano.png file instead of the _slide.png file when generating the ometiffs folder. But the same already happened in the previous preprocessing script.

nilseling commented 2 years ago

And the export of then _pano.png files appears quite slow (~1min per file)

jwindhager commented 2 years ago

Hi @votti, thanks for the quick feedback!

1) According to which rationale are image pre-processing steps put into the conversion step vs the CP pre-processing? Eg the hot pixel filtering was put back into the conversion step, but the 2x image scaling not?

Moving the hot pixel filtering out of CellProfiler was requested by @nilseling, I assume for the reasons above.

2) If we have now the imcseg helper package: Could it make sense to factor out some of the more complicated cells of the notebook into higher level functions in the imcseg package? Could beneficial for testing and reuse.

I really don't have a strong opinion/preference here, but my time resources are limited. If you or anyone else thinks it would be good to have higher-level functions, I'd be happy to take pull requests on this branch.

jwindhager commented 2 years ago

@jwindhager I noticed that the slide overview is written out as a _pano.png file instead of the _slide.png file when generating the ometiffs folder. But the same already happened in the previous preprocessing script.

IIRC, that's because the slide overview actually IS a panorama :-) I believe this may have changed with Fluidigm software releases.

And the export of then _pano.png files appears quite slow (~1min per file)

Indeed! I think this may be due to the high default PNG compression rate: https://github.com/imageio/imageio/issues/387

nilseling commented 2 years ago

Hey @votti, a design question: is the measurement of the pixel probabilities in the last pipeline important? I haven't seen anyone using this information. If not I would suggest to directly write the filtered images into analysis/cpout/images, the masks directly into analysis/cpout/masks and import both these folders for the final measurement pipeline. I would make it slightly cleaner but you would loose the probability measurement.

nilseling commented 2 years ago

Hey @votti, a design question: is the measurement of the pixel probabilities in the last pipeline important? I haven't seen anyone using this information. If not I would suggest to directly write the filtered images into analysis/cpout/images, the masks directly into analysis/cpout/masks and import both these folders for the final measurement pipeline. I would make it slightly cleaner but you would loose the probability measurement.

Nevermind, I figured out a way to move the probabilities from pipeline 3 to pipeline 2. But do you remember where the discrepancy between the 2 sets of segmentation masks (the one written in pipeline 2 and the one written in pipeline 3) is coming from?

votti commented 2 years ago

Hi @nilseling

@probabilities export: I think in the base pipeline this can be savely removed.

@discrepancies in masks: If I remember write the reason is that technically the masks written out in the first pipeline can have some missing object ids. This caused issues downstream (eg HistoCAT).

Thus during the loading of the masks in the measuring pipeline, the setting that objects are re-numbered is used. This guarantees a continuous numbering of objects (so if there are n objects, all object ids 1:n are used).

Generally, I always considered it to be "best practice" to store the exact object masks used together with measurements - just because the measurements are only meaningful with the corresponding masks. Matching masks exported from one pipeline with measurements from another pipeline can be error prone in case that users activate renumbering etc when loading the masks. (P.s.: I always thought the best would be to actually store the pixel coordinates of each object as an entry of the measurement table - unfortunately, at the time I have not seen any of the standardized output formats supporting this.)

Cheers!

nilseling commented 2 years ago

Hi @votti

thanks for the reply! I'll keep the probabilities for now. For the segmentation masks: good point with the continuous cell labels! I do however noticed that when relabelling the cells in the ConvertImageToObjects call, CellProfiler introduces mislabeled pixels. Out of ~2500 objects around 15 objects are newly labelled objects of size 1-2 pixels. I was able to reproduce this issue with CP v4.2.1 and CP v4.0.7. So I would like to avoid relabelling as part of ConvertImageToObjects using CellProfiler. I believe these are pixels which are not connected to the main object body anymore.

The FilterObjects module reindexes objects by default. So missing objects can only come from the ResizeObjects step (I was able to reproduce this behaviour by shrinking cells before resizing). In any case I wonder why the probabilities are not first downscaled before segmentation? This would avoid losing objects during resizing and facilitate nuclei <-> cell mapping.

I have also checked what CP does with missing object IDs and it fails when measuring objects. I do understand and agree with the approach to export measurements and masks in the same pipeline but the reindexing in the ConvertImageToObjects module is buggy. I have filed an issue with CellProfiler.

What do you think?

nilseling commented 2 years ago

Out of ~2500 objects around 15 objects are newly labelled objects of size 1-2 pixels.

I just saw now that this can be fixed by increasing the Connectivity in the ConvertImageToObjects module. Not sure which default setting is sensible though.

jwindhager commented 2 years ago

This PR has been merged into development in favor of #87

BodenmillerGroup / ImcSegmentationPipeline

v3 #85