constantinpape / elf

Utils and convenience functions for large-scale bio-image analysis.
MIT License
47 stars 18 forks source link

Question about node_labels and energy in multi-cut algorithms #93

Closed liuyx599 closed 4 months ago

liuyx599 commented 5 months ago

Hi, I've recently been studying the theory related to multicut, and at the same time I've been studying your code, and I'm very thankful that you've open-sourced such a systematic tool.

I have written an example according to the code, as shown below, I would like to know what is the meaning of this node_labels that is finally returned, I check its value, it seems to be a series of serial numbers starting from 0, if I want to know which edges are cut off and which edges are retained, how can I do it? Furthermore, I am curious about the concept of energy mentioned in the code. Could you please explain its meaning and how it relates to the multicut process? process?

Thank you once again for your valuable contribution and for taking the time to address my inquiries

import nifty
import elf.segmentation.multicut as mc
from elf.segmentation.multicut import _to_objective

demo_txt = "./small_problem_sampleA.txt"
problem = np.genfromtxt(demo_txt)    

uv_ids = problem[:, :2].astype("uint64")   # 417212 2
costs = problem[:, -1].astype("float32")     

n_nodes = int(uv_ids.max()) + 1   # 58583  
graph = nifty.graph.undirectedGraph(n_nodes)  
graph.insertEdges(uv_ids)          # #Nodes 58583 #Edges 417212

objective = _to_objective(graph, costs)

node_labels = mc.multicut_kernighan_lin(graph, costs)  #  58583,3    
energy = objective.evalNodeLabels(node_labels)   # -76916.05659139407  
constantinpape commented 5 months ago

I have written an example according to the code, as shown below, I would like to know what is the meaning of this node_labels that is finally returned, I check its value, it seems to be a series of serial numbers starting from 0,

The node_labels give you the partition of nodes that is found by the multicut. For example, if you have 10 nodes labeled from 0 to 9:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

then a possible partition would be a grouping into 3 different partition elements, and could be expressed as follows:

[0, 1, 0, 0, 2, 0, 2, 1, 2, 2]

This tells you for each node which partition element it is assigned to. The indexing is w.r.t. the first node list.

if I want to know which edges are cut off and which edges are retained, how can I do it?

You can just compare for each edge if the two node labels of the connected nodes are the same (edge is not cut) or different (edge is cut). The following expression should work for this (but I haven't tested it, so might have some typo)

cut_edge_indicator = node_labels[uv_ids[:, 0]] != node_labels[uv_ids[:, 1]]

Furthermore, I am curious about the concept of energy mentioned in the code. Could you please explain its meaning and how it relates to the multicut process? process?

Th energy refers to the value of the multicut objective, so the sum of costs of the cut edges.

liuyx599 commented 5 months ago

Thank you very much for your answer, fantastic! Also, I further studied the other functions in elf and if I have a scenario problem now, it is(u,v,prob) prob represents the probability that these two nodes belong to the same cell and it lies between 0 and 1. If I now have a series of edges as well as probabilities, should I apply the transform_probabilities_to_costs(in elf.segmentaion.multicut) function to convert these probabilities to the cost of the edges. because in the code I implemented above, I used one of the txt documents from your example, which seems to have edges in it with weights that are already cost instead of probabilities.

constantinpape commented 5 months ago

Yes, if you have probabilities in range 0,1 you need to use transform_probabilities_to_costs in order to bring them into the correct value range for edge costs.

liuyx599 commented 5 months ago

hello, I learned today about your application of multicut in image segmentation multicut in image segmentation. In the example, the predicted probabilities(variable predictions ) of the boundaries are converted to the weights of the edges (as if they were costs)

        eps =  0.00001
        p1 = numpy.clip(predictions, eps, 1.0 - eps)
        weights = numpy.log((1.0-p1)/p1)

This transformation seems to be consistent with the function transform_probabilities_to_costs

In the example image segmentation, the larger the predicted probability of superpixel, the more it represents that it is a boundary probability, so the smaller the predicted probability, the more it represents that it falls within the cellular region. Therefore, this means that the larger this probability value is, the more likely this edge will be maintained after multicut.

Back to my above scenario, suppose I also have a series of edges and probabilities, the probabilities also falls between 0 and 1, but the higher the probability, the more the two nodes should merge. Or analogously, the higher the probability the more it means that these two nodes belong to a certain cellular region, and this probability seems to be the opposite of the boundary probability in the example, if I want to directly use the multicut method in the example, shouldn't I do a transformation of my probabilities first, e.g., f(p)=1-p, and after that use the function transform_probabilities_to_costs to get the cost, and then finally, I can directly invoke the related multicut API

constantinpape commented 5 months ago

Yes, you need to do 1 - probabilities before passing them to transform_probabilities_to_costs if your probabilities encode merge likelihoods rather than boundary probabilities.