Closed kochsebastian closed 1 year ago
There are three parts of evaluation. One is relationship triplet, one is object, and one is only the predicate. For the triplet one we consider both nodes as well as the predicate.
Yes, this is clear to me.
However, this snippet is from the relationship triplet evaluation. And when the GT predicate is not None
then you evaluate the triplet of (subject, object, predicate). This is correct in my opinion.
But, when there is no GT predicate, you only evaluate the scores with the threshold. I believe this simplifies the evaluation quite a lot, because for this edge in the graph, you only check if other combined scores are below a threshold.
For me this does not evaluate the triplet (subject,object,None) but more like (any subject, any object, None).
Is this more clear what I mean? Do you agree? Or am I missing something?
Yes, you are right. That indeed simplified the evaluation a lot. We followed the same metric as the previous paper (3DSSG). Personally, I think that is not the best metric to measure performance.
Okay thank you for the clarification.
I have a question regarding your evaluation code for the relationship metric. More specifically, how you handle GT
None
edges/relationships.I am not sure if this snippet in your code is entirely correct: https://github.com/ShunChengWu/3DSSG/blob/master/utils/util_eva.py#L159-L174
So if the GT edge/predicate is
None
which is equal togt_r
is empty, then you have a separate evaluation where you only check if your top predictions are below your threshold. However, this means you're not evaluating if the object nodes are correct. I guess you just evaluate that you are predicting no predicate. However, this is not really in the spirit of the relationship metric, right? Maybe this produces better results than the method actually can provide? I think to evaluate the triplet correctly, you should still evaluate if the object nodes are predicted correctly.Am I missing anything in your evaluation which justifies the evaluation procedure, or is this indeed slightly incorrect?