[CI] Add more unit test to ensure the the outputs are reasonable

SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

https://scisharp.github.io/LLamaSharp

MIT License

2.11k stars 285 forks source link

[CI] Add more unit test to ensure the the outputs are reasonable #704

Open AsakusaRinne opened 1 month ago

AsakusaRinne commented 1 month ago

Description

Our unit test ensures that loading model and running inference successfully, but cannot indicate the output is reasonable instead of kinda garbage. Currently we need to run the examples when making some major features & fix, which is a bit annoying.

To address this issue, I think we could send the output to OpenAI chatgpt to check if it's reasonable. I will afford the payment of the tokens but will use github.triggering_actor to allow only developers who have write access to trigger the corresponding workflow.

martindevans commented 1 month ago

We could also hardcode the expected responses in the unit tests. For example in this test it generates two completions of "Question. what is a cat?\nAnswer:" and assets that they are the same. We could assert the exact response too.

Of course this would only work with temp=0 and a specific model (even a specific quantisation), but it might save a few OpenAI calls!

SignalRT commented 1 month ago

In my opinion it would be easier the alternative that Martin proposes. We can not run all the test in CI y we should verify all the test locally.

AsakusaRinne commented 1 month ago

We could also hardcode the expected responses in the unit tests.

Yes, I also want to save the tokens where this approach works! I'll only consider using OpenAI API when necessary.

We can not run all the test in CI. we should verify all the test locally.

I tend to view things a bit differently. The workflows and unit test are responsible for reducing the risk when we merge the PRs. As long as the workflows pass, it should be equal to saying that terrible behaviors won't appear if we merge the PR.

However, due to the GPU backends, it's indeed hard for us to cover all the cases in the workflows. I can provide a machine with Nvidia GPU and Linux OS to run the workflows, but no idea for Windows yet. :)