Open yoavkatz opened 9 months ago
reproducible via prepare.card.cola.py
For the case in question, where predictions = references = ['acceptable', 'acceptable', 'acceptable'], by the book: we only have TP here (or only TN), all three other components are 0, so the end result is 0.0
And an elaborated proof "by hand":
The HF metric calls scikit-learn:
from sklearn.metrics import matthews_corrcoef
def _compute(self, predictions, references, sample_weight=None): return { "matthews_correlation": float(matthews_corrcoef(references, predictions, sample_weight=sample_weight)), }
[ Dot it ](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html)
I think the issue that in v=[0,0,0] or v=[1,1,1] - there is only a single class. This is a special case not treated in the implementation.
This seems to be a known issue that has a PR , but was not fixed.
scikit's implementation faithfully follows the definition (as there is only TN or only TP, and all other three components are 0, hence the result, by definition of matthew_coef, is 0). The question is whether for our case, when for testing a metric, we 'fake' full hit, or full miss, we should tweak the fake..
Right. The metric is ill defined in this case 0/0. They suggest in the above issue to have a special flag for this, but they did solve this yet.
Can you repeat the above code with ref and pred each enumerating on (0,0),(0,1), (1,0), and (1,1) independently. I want to see all the corner cases.
pred. ref expected result (0,0) (1,1) 0 (1,1) (1,1) 1 (0,0) (0,0) 1 (1,1) (0,0) 0
Gladly, I think that in all of your cases, there is only a single input term that is 2, and the tree three others == 0, so the nominator is 0 in all of your cases:
Ok. So we should add a check, that if all the predictions are the same value (p), and all the references are the same value (r), we return 0 if p !=r and 1 if p=r.
Can you also check that all these are between 0 and 1?
(1,0) (1,1)
(0,1) (1,1)
(1,0) (0,0)
(0,1) (0,0)
total loss for Matthews is -1, not 0:
I think that since Matthews returns 0, by definition, for any case that the nominator in the formula is 0, (namely: (either TP or TN is 0) and (either FP or FN is 0)), no matter how nice the predictions are, I suggest to add a warning message in such a case, rather than override Matthews.
Yes. You are right - as this is correlation [1,0] and [0,1] are indeed anti-correlated (-1).
You can see what they did in f1 (and what they plan to do got Matthews) here:
https://github.com/scikit-learn/scikit-learn/pull/25531/files
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn" Sets the value to return when there is a zero division, i.e. when all predictions and labels are negative. If set to "warn", this acts as 0, but warnings are also raised. predictions and labels are negative.
However, we don't have use for warning. Noone sees them , as the results are stored and viewed in a report. So we can return np.nan - but it would be odd for a perfect prediction to return correlation of nan.
Why is this the accepted behavior (strict=False was set a long time ago)?
The results of running the main metric in used in the card (matthews_correlation) over simulated predictions that are equal to the references returns a different score than expected. One would expect a perfect score of 1.0 in this case, but returned metric score was 0.0. This is flagged as only as a warning because strict=False was set in the call to test_card().The predictions passed to the metrics were: ['acceptable', 'acceptable', 'acceptable']