Add IndirectArray(seg) - Githubissues

timholy commented 2 years ago

This makes it easy to visualize the results of segmentation. After this, we can replace the recommendation

imshow(map(i->segment_mean(segments,i), labels_map(segments)))

in the documentation with

imshow(IndirectArray(segments, segment_mean(segments)))

which should be easier to remember.

timholy commented 2 years ago

it's quite intuitive to think of segmented images as pair of values and indices

But what do you choose for the values? The label index, an arbitrary color, or the mean color of all pixels assigned the same label? For visualization in particular, the last two are relevant (an arbitrary color emphasizes the identities of the segments, even when two segments have similar mean color). One goal here was to make it easy for the user to choose: IndirectArray(seg) uses the label index (assigning each an arbitrary color), and IndirectArray(seg, segment_means(seg)) assigns each the mean color.

Can these two properties be calculated only from segment_labels and image_indexmap?

Yes, we could do that, and it's not a crazy thought. But it's also worth understanding that they are generated by many algorithms during segmentation, so this is essentially a way of preserving your work. If you are, say, segmenting a 1TB electron microscopy connectomics data set, it might be a bit awkward to make another trip through a 1TB file just to calculate these (given that you calculated them once already).

johnnychen94 commented 2 years ago

My main concern is that SegmentedImage is not a useful abstraction. Except for the image_indexmap, all other fields can be calculated by walking through the image or indexmap for a second round. Yes, there will be performance concerns, but that isn't a valid argument on why SegmentedImage should be constructed like this. What advantages does it bring to us by making such an abstraction rather than a simple Tuple or NamedTuple? Also, if we see the bool-valued array as a two-value array, then it's also an index map and then we don't have the issue when considering active contour https://github.com/JuliaImages/ImageSegmentation.jl/issues/67#issuecomment-910400531.

I think we should separate the need for visualization from how we should represent a segmented image. A segmented image can also be represented as an array of pixels with coordinates and labels:

struct LabeledPixel{T<:Colorant,N}
    coord::Dims{N}
    value::T
    label::Int
end

# or
struct Pixel{T<:Colorant, N} <: AbstractVector{T}
    coord::Dims{N}
    value::T
end
struct LabeledPixel{T<:Pixel}
    value::T
    label::Int
end

IMHO, segment_labels, segment_means and segment_pixel_count should be considered as optional extra information.

Take LabeledPixel into account, we don't need to use IndirectArray to display it, so if we want to provide a convenient visualization tool for segmented results, we should provide a function instead of type constructor IndirectArray. My design on SLIC is to see segmentation as an analysis process (analyze) that breaks things down into parts and see visualization as a synthesis process (synthesize) that composes parts together.

timholy commented 2 years ago

Ah, now I think I understand your point better, and I agree it's a good one. I guess the best argument in favor of the current system is that it bundles together a number of different results. An example of a package that does something similar is Optim (which, I'll agree, is not the most beautifully-designed package...), for which results are returned in a giant structure for which many of the fields could be computed from others (e.g., minimum can be computed from f(minimizer)) and others could be returned as independent arguments.

The best argument against it is that, indeed, we don't really do anything with a SegmentedImage, and indeed we provide lots of accessors whose names are hard to remember as an attempt to avoid having people access the fields directly. This just feels like bloat.

Would you be happier if algorithms just returned the individual fields (those that they compute along the way)? For example, seeded_region_growing computes region_means and region_pix_count along the way (other than for the 0 label). It could return those in addition to image_indexmap.

With regards to how one labels an image, an array of integer labels is pretty common. Is your concern that it's disconnected from the array values? This is basically an array-of-structs vs struct-of-arrays issue. If you might use the image on its own again, there is no reason to push both into the same array, and the cost in memory is potentially significant. I guess I don't see an advantage in your specific proposals for representation/visualization. Some of what you're proposing seems to be aiming at the question of dense or sparse representations, but we can make that decision in the array type: segment_me!(dest, src, args...) would allow the user to pass in either an Array{Int} (for a dense representation of segment labels) or a Dict-backed array as in https://github.com/Jutho/SparseArrayKit.jl (for sparse representation). That's exactly how I designed my recent flood_fill! PR.

johnnychen94 commented 2 years ago

Would you be happier if algorithms just returned the individual fields (those that they compute along the way)? For example, seeded_region_growing computes region_means and region_pix_count along the way (other than for the 0 label). It could return those in addition to image_indexmap.

Yes, I think this is how it should be; it should be sufficient to return a Tuple or NamedTuple if we want a bundled object. And if we do this then we don't need these two IndirectArray(::SegmentedImage) methods, we can have a convenient function to build IndirectArray (or maybe other array types) instead.

I guess I don't see an advantage in your specific proposals for representation/visualization. Some of what you're proposing seems to be aiming at the question of dense or sparse representations, but we can make that decision in the array type: segment_me!(dest, src, args...) would allow the user to pass in either an Array{Int} (for a dense representation of segment labels) or a Dict-backed array as in https://github.com/Jutho/SparseArrayKit.jl (for sparse representation). That's exactly how I designed my recent flood_fill! PR.

I'm still thinking about what's the "best" way to represent a sparse image (or an colorful object/surface in an euclidean space), the one I proposed in the comment is just one possibility that goes against the current SegmentedImage abstraction, I agree that it takes extra memories to store the data. Maybe a Dict-based array would be enough for the purpose.

timholy commented 2 years ago

For a multidmensional array, I am not aware of anything better than a Dict(index=>value). SparseMatrixCSC is a disaster for inserting new values; the only thing it's good for is matrix multiplication.

One good way to figure this out would be to think about what the documentation would look like given different options. I frequently find that if I don't like how the docs come out, it's a sign that there's something wrong with the API. My IndirectArray proposal is part of trying to ameliorate some of my unease, but a bigger change may be called for. With #74 we'll have to make a breaking release anyway.

johnnychen94 commented 2 years ago

For a multidmensional array, I am not aware of anything better than a Dict(index=>value). SparseMatrixCSC is a disaster for inserting new values; the only thing it's good for is matrix multiplication.

This is off-topic and not in the scope of this package at all: if we relax from typical generic sparse representation and checking some image-specific formats, there might be more ideas. For instance, any graphic objects (balls, spheres, lines), meshes, quadtree, R-tree, or even constructing a SVG from image

One good way to figure this out would be to think about what the documentation would look like given different options. I frequently find that if I don't like how the docs come out, it's a sign that there's something wrong with the API.

How about just extending distinguishable_color? Maybe this is enough to save the typings?

distinguishable_color(labels::Vector) = Dict(zip(labels, distinguishable_colors(length(labels))))

I could imagine usage like the following and it's not hard to intuit or understand.

index_map, labels = some_algorithm(img)
IndirectArray(index_map, distinguishable_color(labels))

index_map, labels, mean_values = another_algorithm(img)
IndirectArray(index_map, mean_values)

timholy commented 2 years ago

That works for me too. If you want I can take a stab at this. I propose we get a new non-breaking release out the door, finishing #72, and then perhaps we can tackle the API changes. I'm willing to take the lead here unless you'd rather do it.

johnnychen94 commented 2 years ago

I'll become super busy this semester so you can take the lead here. I plan to spend more of my free time on the JuliaImages code organization, documentation, test and benchmark systems...

Edit: FWIW, I'm not sure if you're interested, I have some plans on designing a computational graphic system to replace our old ImageDraw, and the first step is https://github.com/johnnychen94/CoordinateSystems.jl/pull/1, when that one is finished, I plan to attach color information and add AD support for it, and figure there might be something interesting when applied to image reconstruction or registration tasks... If this goes smoothly then it will be part of my Ph.D. thesis so this will definitely take a large amount of my time.

johnnychen94 commented 2 years ago

I know you're reaching for some new better status, I hope I don't block your development workflows by commenting here and there.

I propose we get a new non-breaking release out the door, finishing #72, and then perhaps we can tackle the API changes.

This sounds good to me.

timholy commented 2 years ago

No, I really appreciate and benefit from your perspective. It's great to discuss API with someone who has such a good sense of design!

JuliaImages / ImageSegmentation.jl

Add IndirectArray(seg) #76