deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
183 stars 58 forks source link

[CI] LLM Integration Tests through pytest suite #2023

Closed zachgk closed 4 weeks ago

zachgk commented 1 month ago

This moves the llm_integration test suite from actions into the pytest runner. This means that it can be run locally and is somewhat more maintainable.

As future work, other action test suites can also be combined into the unified file. I plan to leave this action dedicated to spinning up many instances and running the entire suite. Later, I plan on adding a new action that will spin up a single instance and run a configurable part of the suite (using pytest classes or marks to specify which part of the suite to run)

@tosterberg

zachgk commented 1 month ago

The last run was https://github.com/deepjavalibrary/djl-serving/actions/runs/9325391053. It was still failing on the TestTrtLlmHandler2 due to what I believe is OOM (github actions doesn't display it clearly), but passed every other test

An example of a failing test would be https://github.com/deepjavalibrary/djl-serving/actions/runs/9324373604/job/25669550839. You can see how it shows a summary at the bottom and then the full output above.

I have made a few CI changes that I haven't merged because the CI isn't stable. But, I think it may be better to go ahead and merge first and then we can do minor fixes afterwards to ensure we can actually make progress. So, I suggest merging this right after the release is done

lanking520 commented 1 month ago

@zachgk I took a look at the change you made to tests. Currently it is not reporting which model failed and/or share any logs to mention what model is being tested. How could we easily find out such information with the current change?

zachgk commented 1 month ago

@lanking520 The first example isn't reporting anything because it seems like it is crashing the entire actions runner and aborting the job/pytest call. I believe it is failing on the first test because I think pytest would otherwise begin printing out a basic progress marker with . to indicate successful tests and F for failed ones. But it needs some further investigation

As a comparison, take a look at the second example. This is what a failure is intended to look like

zachgk commented 1 month ago

It is updated to now upload logs for all tests. I also rebased it for updated tests. The latest run is https://github.com/deepjavalibrary/djl-serving/actions/runs/9473108790. After that run, I also had it start uploading logs for failing test jobs as well