AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
427 stars 23 forks source link

Add make lint and make fix-lint and make the code compliant #46

Closed diegoceccarelli closed 1 year ago

diegoceccarelli commented 1 year ago

This commit adds two new targets to the Makefile: make lint and make fix-lint.

make lint will make sure that the code is properly formatted, perform linting with ruff, check for typos (using typos, and make sure order of imports is according to the pep8 standard (using isort).

Code has been fixed to pass all the checks. Next step should be set github action so that checks are performed for each PR as a prerequisite to merge into master.

I ran tests locally and they are all successful.

diegoceccarelli commented 1 year ago

All good, I ran all the notebooks (including efficiency_test fully) and they still work.

The only thing when I ran efficiency_test with 8 threads i got this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 28
     24 results.append([query_count, "pytrec_eval", 1, "ndcg", ndcg_avg_time, 1.0])
     26 for threads in [8]:
     27     # Run metrics once to ensure they have been compiled
---> 28     evaluate(re_qrels, re_run, [f"map", f"mrr", f"ndcg"], threads=threads)
     30     x = get_ipython().run_line_magic('timeit', '-o -q evaluate(re_qrels, re_run, f"map", threads=threads)')
     31     avg_time = max(round(x.average, 3) * 1000, 1)

File ~/dev/ranx/.venv/lib/python3.9/site-packages/ranx/meta/evaluate.py:128, in evaluate(qrels, run, metrics, return_mean, threads, save_results_in_run, make_comparable)
    126     set_num_threads(1)
    127 elif threads != 0:
--> 128     set_num_threads(threads)
    130 if make_comparable and type(qrels) == Qrels and type(run) == Run:
    131     run = run.make_comparable(qrels)

File ~/dev/ranx/.venv/lib/python3.9/site-packages/numba/np/ufunc/parallel.py:608, in set_num_threads(n)
    606 if not isinstance(n, (int, np.integer)):
    607     raise TypeError("The number of threads specified must be an integer")
--> 608 snt_check(n)
    609 _set_num_threads(n)

File ~/dev/ranx/.venv/lib/python3.9/site-packages/numba/np/ufunc/parallel.py:570, in gen_snt_check.<locals>.snt_check(n)
    568 def snt_check(n):
    569     if n > NUMBA_NUM_THREADS or n < 1:
--> 570         raise ValueError(msg)

ValueError: The number of threads must be between 1 and 4

Removing 8 from the line:

<<< for threads in [1, 2, 4, 8]:
>>> for threads in [1, 2, 4]:

Fixed. I don't think it's related to my change?

AmenRa commented 1 year ago

I suspect the problem is caused by your computer having fewer than eight threads or numba not detecting more than four threads from the CPU.

You can check that by running the following:

from numba.core.config import NUMBA_NUM_THREADS

print(NUMBA_NUM_THREADS)

I prefer not to change the notebook as I used it to compute the execution times reported in the first paper about ranx.

Thank you!

diegoceccarelli commented 1 year ago

Sure, I suspected it was due to my laptop configuration. Anyway not related to this PR, can we merge?

If you want on a separate PR we could handle nicely the fact that the machine doesn't support that number of threads (log a warning and skipping).

AmenRa commented 1 year ago

Sorry for the delay, I supposed you had changed the number of threads in the list in the PR. That's why I did not merge it sooner.

About the warning, I don't think it is necessary.