confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
3.65k stars 291 forks source link

Asynchronous test runs are sometimes not completed correctly #1147

Open jmaczan opened 5 days ago

jmaczan commented 5 days ago

Describe the bug When running evaluate() with run_async=True, sometimes tests are not completed (so any job/task/pipeline that relies on exit code will fail). Results are neither printed nor emitted. The evaluation is essentialy stuck after running the last test case. It is not always the case, without any regularity, so it looks like a race condition. Likely there is issue with async stuff in evaluate.py file - a_execute_test_cases(), get_or_create_event_loop() or loop.run_until_complete.

It might be possible that await asyncio.sleep(throttle_value) leads to semaphore being stuck or something. I haven't debug it except a brief static code analysis, though.

To Reproduce Steps to reproduce the behavior:

  1. Create a test cases list
  2. Run them using evaluate() with run_async=True
  3. All tests are executed asynchronously and that's fine
  4. Results are not printed, saved, etc. It is essentialy stuck after running the last test case

Expected behavior Tests always end with printing either results or errors. They should never last infinitely

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

penguine-ip commented 2 days ago

@jmaczan I never encountered this issue, can you give us something to reproduce it? For example, number of test cases, metric you're using, etc.