OpenBioLink / SAFRAN

Scalable and fast non-redundant rule application for link prediction
MIT License
44 stars 6 forks source link

eval.py does not always run #18

Closed Jean-KOUAGOU closed 1 year ago

Jean-KOUAGOU commented 1 year ago

python SAFRAN/python/eval.py SAFRANBinaries/results_wn18rr/predictions.txt datasets/wn18rr/test.txt Traceback (most recent call last): File "SAFRAN/python/eval.py", line 65, in <module> res = evaluate(sys.argv[1], sys.argv[2]) File "SAFRAN/python/eval.py", line 60, in evaluate result = evaluate_policy(path_predictions, n, "average") File "SAFRAN/python/eval.py", line 40, in evaluate_policy ranking = rankdata([-x for x in conf], method=policy) File "/.conda/envs/enhancerl/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 9501, in rankdata contains_nan, nan_policy = _contains_nan(arr, nan_policy) File "/.conda/envs/enhancerl/lib/python3.10/site-packages/scipy/_lib/_util.py", line 637, in _contains_nan if np.issubdtype(type(el), np.number) and np.isnan(el): TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Would there be a problem if confidences_head = [float(x) for x in pred_heads[1::2]] confidences_tail = [float(x) for x in pred_tails[1::2]] are used in the function def read_predictions(path)?

nomisto commented 1 year ago

Hmm using float should not be a problem. Could you try if doing

confidences_head = np.array(confidences_head, dtype=np.float64)
confidences_tail = np.array(confidences_tail, dtype=np.float64)

in read_predictions solves the problem? (inserted after that confidences_head = [float(x) for x in pred_heads[1::2]] confidences_tail = [float(x) for x in pred_tails[1::2])

Jean-KOUAGOU commented 1 year ago

Even without the two lines confidences_head = np.array(confidences_head, dtype=np.float64) confidences_tail = np.array(confidences_tail, dtype=np.float64) the problem is solved

nomisto commented 1 year ago

Ok weird. But as I understood the issue is fixed? Closing this for now then.

Jean-KOUAGOU commented 1 year ago

But it should be updated in the code, right?

nomisto commented 1 year ago

I am sorry, I don't get exactly what you mean. You wrote that it does work without the explicit conversion (np.array(..., dtype=np.float), so I assumed all is fine. Did I misunderstand that?

Jean-KOUAGOU commented 1 year ago

In the current implementation, you used the following lines of code:

confidences_head = [int(x.replace("0.", "").replace("1.","1").ljust(100, "0")) if (not x.startswith("1.") and not x.startswith("1")) else int("1".ljust(101, "0")) for x in pred_heads[1::2]] confidences_tail = [int(x.replace("0.", "").replace("1.","1").ljust(100, "0")) if (not x.startswith("1.") and not x.startswith("1")) else int("1".ljust(101, "0")) for x in pred_tails[1::2]] which lead to the error.

I replaced those lines by shorter ones:

confidences_head = [float(x) for x in pred_heads[1::2]] confidences_tail = [float(x) for x in pred_tails[1::2]] and it fixed the issue. My question is whether you would update your code and use my suggestion instead.

nomisto commented 1 year ago

Ah ok, that information was missing. Well the reason for those weird code of mine which converts float 0.95 to int 95, is a precision problem which mostly occurs when using Noisy-OR (float is limited in precision, integers in python are not).

But yeah, since this is quite the edge case, I would also recommend to replace it with your code. Do you want to make a PR or should I change it?

Jean-KOUAGOU commented 1 year ago

The code actually seems to convert 0.95 to a very large number, say greater than 10^50. These numbers might be considered infinite and make scipy crash.

Sure I can create a PR

nomisto commented 1 year ago

Yes it is basically justified to 100 digits. Maybe yes, it worked for me at least with previous versions of scipy.

Thanks!

Jean-KOUAGOU commented 1 year ago

I do not have access to OpenBioLink/SAFRAN.git

Please update the code yourself