Labelings / Labeling

BSD 2-Clause "Simplified" License
4 stars 2 forks source link

Considerations on `Labeling` equality #6

Closed gselzer closed 2 years ago

gselzer commented 2 years ago

Hey @tomburke-rse!

I was writing some code that tested the conversion of a Labeling into an ImgLabeling and then back into a Labeling. I would think that if you do this "circular" conversion, you'd end up with a Labeling that was equivalent to the one you started with. Unfortunately, you don't - the BsonContainer has a couple of properties that differ from the original:

At the end of the day, I'm not sure what the correct behavior should be here. As far as Labeling equality goes, what should hold for two Labelings to be considered "equal"? I figured that for two Labelings to be equal, both outputs of Labeling.get_result() should be equal. In the case of a Labeling -> ImgLabeling -> Labeling conversion, the BsonContainer output will thus not be equal. Considering the two differing properties above, it seems natural that they should differ, but then they should probably not be part of the equality check...

What is your opinion on this? Should the output of that "circular" conversion be "equal" to the original? If so, can we make changes such that those properties are not taken into account when assessing "equality"?

tomburke-rse commented 2 years ago

Hey @gselzer,

Circular conversion should work and always be equal, that should be the goal. If that's not happening at some point, that's a bug. At least as long as you don't convert to ImgLabeling that is. at that point, you would lose the two data point you mention. You might also loose any additional metadata that is not supported by ImgLabeling.

In Java, we could remove those values from the equals-method and do something similar in python. BUT! If you do the following 2+ ImgLabeling(which is possible and intended) -> Labeling -> ImgLabeling -> Labeling the two Labeling will be equal in representation, but not information, i.e. the values in label_sets will likely differ, but show the same thing. this is especially the case if you only add patches(parts of an image) to the Labeling. I tried to alleviate this through the clean_up method, but I haven't tested it thorouhgly on equality.

Long story short: I think we should aim for equality if possible, even if only partial (like some fields being excluded).

gselzer commented 2 years ago

BUT! If you do the following 2+ ImgLabeling(which is possible and intended) -> Labeling -> ImgLabeling -> Labeling the two Labeling will be equal in representation, but not information, i.e. the values in label_sets will likely differ, but show the same thing.

I guess I'm not quite sure what you mean here. Can you elaborate? I'd think that the Labelings should be the same, due to our desire to make circular conversion equal. So should the values in label_sets show the same thing, ideally? Is there something that prevents this from happening?

Long story short: I think we should aim for equality if possible, even if only partial (like some fields being excluded).

Great, so what is the definition of "equality" that we aim for? Equality of information (i.e. the image and the label_sets), or information + metadata? We should probably write a function to handle this.

tomburke-rse commented 2 years ago

Let me double check it first. Maybe I'm overthinking already since it's been a while that I worked at that code and this isn't even an issue anymore. But let me try a super minimal example to show what I meant with representation and information. Assume 2 labelings with the following 2x2 structure: 1 0
0 0

2 0 0 0

While the values (the information) are different, the show the same segmentation (representation).

I wrote the clean_up to make sure that each run on the same data produces the same result, but I never tested it on circular transformation. I'm not 100% here.

About equality: I would go for information, since metadata is always supplied by the user and should be resupplied by the user. We can throw a warning if the metadata does not much though. I can look into this next week, probably on Monday.

gselzer commented 2 years ago

But let me try a super minimal example to show what I meant with representation and information.

Huh, I understand what you are saying but I'm still not sure how it pertains to this discussion. If you have two labelings with different information but the same representation, they will be unequal?

I would go for information, since metadata is always supplied by the user and should be resupplied by the user. We can throw a warning if the metadata does not much though.

Cool, this was also what I was thinking, however I don't know that the warning is necessary. For example, I don't see how it would help me to know that one labeling came from a file and the other did not...

I can look into this next week, probably on Monday.

Great, please let me know if I can help.

tomburke-rse commented 2 years ago

Huh, I understand what you are saying but I'm still not sure how it pertains to this discussion. If you have two labelings with different information but the same representation, they will be unequal?

Yes, they will be unequal and this could be the result of the circular transformation, but that's just a guess which I need to confirm.

Cool, this was also what I was thinking, however I don't know that the warning is necessary. For example, I don't see how it would help me to know that one labeling came from a file and the other did not...

I thought it might be relevant for the user to know that one is loosing the metadata attached after loading it into ImgLabeling and saving again, just a thought though.

gselzer commented 2 years ago

Yes, they will be unequal and this could be the result of the circular transformation, but that's just a guess which I need to confirm.

Hmm, yeah, I'd be interested in this. Let me know what you find.

I thought it might be relevant for the user to know that one is loosing the metadata attached after loading it into ImgLabeling and saving again, just a thought though.

Oh, well I think that logging a warning is a good idea when you do the Labeling -> ImgLabeling conversion; if you are losing metadata then you should throw a warning. I just don't think it makes sense to log the warning when you are checking equality, I thought we were talking about putting the warning there.

tomburke-rse commented 2 years ago

Yes, that makes even more sense to put it there, good idea. I'll add it to Java and Python on monday together with the transformation checking and come back to you.

gselzer commented 2 years ago

@tomburke-rse has this been resolved?