-
gevou updated
10 months ago
-
# URL
- https://arxiv.org/abs/2312.01552
# Affiliations
- Bill Yuchen Lin, N/A
- Abhilasha Ravichander, N/A
- Ximing Lu, N/A
- Nouha Dziri, N/A
- Melanie Sclar, N/A
- Khyathi Chandu, N/A
…
-
For the API-based models, there are frequent claims online that users see models getting worse over time. It would be good to know if that's true. Copying a [comment of mine from HF](https://hugging…
-
Hi,
Thanks for your contribution, it's really useful to see evaluations on real-world data! There are further extraction tools for Python which this repository doesn't feature yet and which could b…
adbar updated
4 months ago
-
I updated Ollama from 0.1.16 to 0.1.18 and encountered the issue.
I am using python to use LLM models with Ollama and Langchain on Linux server(4 x A100 GPU).
There are 5,000 prompts to ask and get…
-
### Describe the issue as clearly as possible:
Example code with Pydantic and generate.json() throws a ValidationError
Code is run from Jupyter Notebook
Output is ok if age: int is removed from t…
-
Hi, @haileyschoelkopf Thank you for your awsome open-source work. We have been evaluating using `lm-eval` and noticed that when using `accelerate` for data parallel inference, the number of GPUs utili…
-
https://huggingface.co/datasets/BioMistral/BioInstructQA
![Screenshot 2024-04-03 at 22 32 34](https://github.com/BirgerMoell/swedish-medical-benchmark/assets/1704131/d3eefcb9-cd8a-4983-81c4-fbc00d320…
-
We are trying to evaluate Named Entity Recognition and Part of Speech tagging tasks, but it is unclear to us how to do that.
We've noticed that `aclue` include a Named Entity Recognition task but it …
-
Dear maintainers,
Thank you for your valuable arena. I am currently researching the way of LLMs evaluation and got stack with a question about Bradley-Terry model.
As it stands, from multiple sou…