ExplainableML / czsl

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.
GNU General Public License v3.0
111 stars 27 forks source link

Role of closed_mask in evaluator vs dataset pairs #35

Closed care77 closed 1 year ago

care77 commented 1 year ago

I find that the CGE and CompCos predict in all pairs (train pairs + val pairs + test pairs) in test time, but some pairs are masked for evaluation (cf. closed_mask in line 247-252, line 300-306, models/common.py). Is there any special meaning in it?

mancinimassimiliano commented 1 year ago

Hi @care77,

closed_mask is needed to perform closed-world evaluation (i.e. excluding pairs not present in the test set). Specifically, in the mask, a value for a pair is 1 if the pair should be considered and 0 if it should not be:

I hope this answers your question but, in case it does not, please let me know.

p.s. For faster OW evaluation, consider using KG-SP and Co-CGE repos.

care77 commented 1 year ago

@mancinimassimiliano Thank you for your answers.
I'm not sure the definition of "compositions not present in the dataset". For example, on MIT-States, there are 1262 seen pairs in train set, 300 seen pairs and 300 unseen pairs in val set, 400 seen and 400 unseen pairs in test set. For evaluation on val set, CGE returns the scores over all pairs (1262 + 300 + 400), but 400 test set unseen pairs will be masked out in evaluator by closed_mask. In this case, are the maksed out pairs the compositions not present in the dataset, If so, why not just retrurn scores of senn pairs (1262 ) and val set unseen pairs (300). Looking forward to your answer.

mancinimassimiliano commented 1 year ago

Thanks @care77 for the clarification!

Ok, I see your point and the answer is that....there is no particular reason. :)

The main one is that the codebase develops on top of the AoP one and there the closed mask is also used to filter pairs (see here).

In my opinion, filtering in the evaluator allows the model to not care about the evaluation procedure. This means that we do not need to define one inference function for each phase (e.g. one for validation and one for the test), and this is the main advantage code-wise. Other than that, I have no strong arguments for advocating in favor of one choice.

Hope this helps!

care77 commented 1 year ago

I get it. Thank you @mancinimassimiliano for your helpful answers.

mancinimassimiliano commented 1 year ago

You are welcome!