AllenCell / cvapipe

Cell Variance Analysis Pipeline
Other
2 stars 0 forks source link

Address segmentation files becoming merged #7

Open evamaxfield opened 4 years ago

evamaxfield commented 4 years ago

Description

A clear description of the bug

The segmentation files produced by the new script / alg shove all segmentations and contours into a single file. Currently the pipeline query returns each one a single path (and single column in the dataset). This will likely result in a bunch of downstream changes to how we process the data. (But not too many)

Would be good to stay updated on when this change will be occurring and how the dataset query will be affected.

toloudis commented 4 years ago

I'm not sure why this would be a problem. It's perfectly fine for MembraneSegmentationFileId and MembraneContourFileId to point to the same file, as long as there are columns that say which CHANNEL in each file contains the important data.

toloudis commented 4 years ago

So there may be a task in order to add the channel index columns...

evamaxfield commented 4 years ago

I think I like this suggestion the most. Basically have the equivalent of ChannelIndexCellSeg like we already have for ChannelIndexBrightfield or w.e.

Aerendel commented 4 years ago

@toloudis @JacksonMaxfield Looking at http://aics.corp.alleninstitute.org/labkey/query/AICS/begin.view?#sbh-qdp-%26processing%26Cell I see NucleusSegmentationChannelIndex and MembraneSegmentationChannelIndex. These should be populated when we generate the cell records (will have to make sure they are accurate). What about the contour channel indices? Are those channels not used by anybody down stream?

evamaxfield commented 4 years ago

Uhhhhhhhh I don't think I have ever seen those columns / those columns are currently not returned by the query: These are all the fields returned by the query that Dan and I use. (Minus GoodCellIndicies, Plus CellId and CellIndex)

evamaxfield commented 4 years ago

So at least from all the pipelines I have touched, none of them to my knowledge have ever used the columns NucleusSegmentationChannelIndex or MembraneSegmentationChannelIndex

Aerendel commented 4 years ago

So those columns exist but are probably not used by the query. Let's address this Monday in grooming and define what is needed. I'll add it to the agenda.

evamaxfield commented 4 years ago

That said, those fields are in line with @toloudis's recommendation so that's great to see, we will just have to change our stuff over to using those. And I would assume adding a field for NucleusContourChannelIndex and MembraneContourChannelIndex would be good.

toloudis commented 4 years ago

Yes, the new columns need to be added to the lkaccess query results. The ones that indicate which channels contain which segmentation data. Including contour. And for membrane, nucleus, and structure. The downstream code needs to be able to open the files and get at the actual channel data for all of those piece of image data.

toloudis commented 4 years ago

The cellbrowser-tools repo combines all the segmentations channels with the raw data channels into one downloadable multichannel image. We need to be able to access all of them: Raw data: membrane, nucleus, structure, brightfield Segmentation: membrane, nucleus, structure Contour segmentation: membrane, nucleus, structure For all of the above, we need a file id, read path, and channel index number.
I believe most are already present except that the segmentations were always assuming channel 0 due to always having been single channel separate tiffs. So the channel index now needs to be indicated for segmentation and contour segmentation.

Aerendel commented 4 years ago

Work captured in https://aicsjira.corp.alleninstitute.org/browse/DPWA-101 and https://aicsjira.corp.alleninstitute.org/browse/DPWA-102

evamaxfield commented 4 years ago

Updating: I see that DPWA-101 is closed and DPWA-102 is in review. When they are both closed / code is merged can you ping this thread. @tyler-foster @Aerendel @kmitcham

tyler-foster commented 4 years ago

Updating: I see that DPWA-101 is closed and DPWA-102 is in review. When they are both closed / code is merged can you ping this thread. @tyler-foster @Aerendel @kmitcham

Wasn't on top of updating tickets. Both 101 and 102 are complete @JacksonMaxfield

evamaxfield commented 4 years ago

So if I am understanding the data correctly:

NucleusContourReadPath and MembraneContourReadPath columns are gone and NucleusSegmentationReadPath and MembraneSegmentationReadPath now contain both the segmentations and the contours. To resolve there are now four new columns:

Does that all look correct?

tyler-foster commented 4 years ago

All of the nucleus/membrane segmentation/contour fileid/filename/readpath columns should still be present, they'll just all have the same values going forward with the current combined segmentation file format. And yes, those 4 channel index columns are present.

It might be worth noting that the channel index columns are not figured out programatically, so they're constants (in the cell row generation script) that map this way:

evamaxfield commented 4 years ago

I see thanks for the info. Will leave this issue open until like everyone has resolved their own changes with the data. But specifically tagging @toloudis and @jxchen01 to note these changes.