-
Hi, i'm trying to reproduce some of yours evaluation experiments. In particular i'm doing a **robustness evaluation on the task of Sentiment Analysis** on the *IMDB* dataset. As you said i'm using the…
-
# Alex Strick van Linschoten - My finetuned models beat OpenAI’s GPT-4
Finetunes of Mistral, Llama3 and Solar LLMs are more accurate for my test data than OpenAI’s models.
[https://mlops.systems/pos…
-
### Project Name
Quiz Maker
### Description
Quiz Maker is a GenAI tool that uses RAG to generate quiz on the fly based on content uploaded. It is an ASP.NET web application that utilises Sema…
-
I’m running an evaluation on the MMBench-en dataset. The evaluation on the MME benchmark went smoothly, but when I switched to MMBench-en, the evaluation speed significantly slowed down.
I’m using …
-
This is a major feature release.
Spec: https://github.com/MadcowD/ell/blob/cd64ab9bb0d3a09195fef7a32ef77ac5d7e6c912/docs/ramblings/evalspec.md
Ramblings: https://github.com/MadcowD/ell/blob/cd64ab9…
-
### Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
### Describe the bug and reproduction steps
Running Browsing Agent with Deepseek, I got a syntax err…
enyst updated
4 weeks ago
-
Could you please provide the process for reproducing the training of the 'cousin_ckpt.pth' and 'twin_ckpt.pth' files? Thank you.
-
### This issue is for a: (mark with an `x`)
```
- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior …
-
Chat GPT-4 can't respond. It just sits and thinks.
Chat GPT 3.5 has no problems.
Debug Data
| Property | Value |
| --- | --- |
| Name | ``"Wolfram/Chatbook"`` |
| Version | ``"1.1.1"`` |
…
-
GPTScore contains very elaborate experimental results for the generation-based evaluation method for lots of downstream NLG tasks, and thank you so much for your work.
Recently, I also notice that …