-
We found that [evaluation.py](https://github.com/mlcommons/inference/blob/master/language/gpt-j/evaluation.py) is not deterministic.
I narrowed down to small and fast reproducer using 100 examples …
-
### Feature request
Add a MistralForQuestionAnswering class to the [modeling_mistral.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py) so …
-
G-Eval includes "Auto Chain-of-Thoughts for NLG Evaluation" as a component where the CoT steps to carry out evaluation are produced by an LLM. The paper nor this repo, however, include the prompt defi…
-
# Alex Strick van Linschoten - My finetuned models beat OpenAI’s GPT-4
Finetunes of Mistral, Llama3 and Solar LLMs are more accurate for my test data than OpenAI’s models.
[https://mlops.systems/pos…
-
RAG Evaluation
1. 100 questions
Types of questions:
- 60 on general trade
- 12 on growth/variation
- 28 on rankings
2. RAG evaluation results
Best combination tested so far: multi-qa-mpne…
-
Hi,
Everything seems to work fine with this apim script, but on logs I can see the following popping up in application insights with slow gpt-4 calls:
_Expression evaluation failed. Unable to cast o…
-
Hi there,
Thank you for bringing the elegant RAG Assessment framework to the community.
I am an AI engineer from Alibaba Cloud, and our team has been fine-tuning LLM-as-a-Judge models based on t…
-
### Pre-check
- [X] I have searched the existing issues and none cover this bug.
### Description
1. cd `/Users/zdavatz/Documents/software/privateGPT
2. I am doing `poetry run python3.11 -m private…
-
Hey,
I was wondering if you think it would be possible to create a synthetic dataset for function calling tasks?
I would like to use that dataset for a finetuning experiment.
Thanks for any guida…
-
when pip install requirement.txt, there is error
```python
remote: Support for password authentication was removed on August 13, 2021.
remote: Please see https://docs.github.com/get-started/getti…