gomate-community / rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.
Apache License 2.0
119 stars 9 forks source link

change the inputs of metrics and the calling methods in tests #103

Closed bugtig6351 closed 3 months ago

QianHaosheng commented 4 months ago

I'm wondering why the git diff function isn't working.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 88.84120% with 26 lines in your changes missing coverage. Please review.

Project coverage is 80.88%. Comparing base (12647a7) to head (7be7a68).

Files Patch % Lines
rageval/metrics/_answer_citation_precision.py 80.26% 15 Missing :warning:
rageval/metrics/_answer_ter.py 55.55% 4 Missing :warning:
rageval/metrics/_answer_chrf.py 80.00% 2 Missing :warning:
rageval/metrics/base.py 80.00% 2 Missing :warning:
rageval/metrics/_answer_f1.py 98.24% 1 Missing :warning:
rageval/metrics/_context_recall.py 93.75% 1 Missing :warning:
rageval/metrics/_context_reject_rate.py 92.85% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #103 +/- ## ========================================== - Coverage 82.17% 80.88% -1.29% ========================================== Files 32 32 Lines 1161 1135 -26 ========================================== - Hits 954 918 -36 - Misses 207 217 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

QianHaosheng commented 4 months ago

Now the input interface of most metrics has been updated, but some new problems have also emerged.

For example:

  1. The current test is not comprehensive and needs to be completed, and the correctness of the implementation of each metrics needs to be reconfirmed.
  2. The task part is not clearly positioned in the project. Now test_evaluation cannot work properly due to the interface has been changed, so it is skipped temporarily.
  3. The banchmarks that calls metrics cannot run normally now, and need to be adapted as soon as possible.
  4. No metric actually performs batch calculations currently. We should consider whether to retain the batch to increase parallel calculations or temporarily abandon it.

In addition, there are some formatting issues in the project, which should be fixed in subsequent updates.