-
Hello 👋
First of all thank you for the great work and evaluation results!
I have understood that in many cases you predicted outputs for each question based on the choice that minimizes the loss…
-
# URL
- https://www.arxiv.org/abs/2409.19924
# Affiliations
- Kevin Wang, N/A
- Junbo Li, N/A
- Neel P. Bhatt, N/A
- Yihan Xi, N/A
- Qiang Liu, N/A
- Ufuk Topcu, N/A
- Zhangyang Wang, N/…
-
I will list the test results of various open-source models here. You can refer to these data to select models and configure devices. Of course, the evaluation of LLM is quite subjective. I also sugges…
-
# Overview
llm-jp-eval 1.4.1を各種モデルで実施するための統合実験。
# Details
## 実験の実施手順
1. 評価を行いたいモデルのHugging Face形式チェックポイントを用意してください。
1. チェックポイントのパスと評価タスク名を本issueのコメントとして投下してください。
1. @odashi がsakura側で評価実験…
-
## Talk title
Let's use RAG for Ansible coding and say goodbye to tedious tasks!
## Talk Description
Writing comprehensive documentation and extensive Molecule test cases is essential when bu…
-
# URL
- https://arxiv.org/abs/2411.08275
# Authors
- Shivani Upadhyay
- Ronak Pradeep
- Nandan Thakur
- Daniel Campos
- Nick Craswell
- Ian Soboroff
- Hoa Trang Dang
- Jimmy Lin
# Abst…
-
[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug.
**Describe the bug**
I set up a defination to split the input Dataset into several…
-
Been wondering about alternative vector stores - we picked Chroma because it looked very simple API-wise, sqlite-based and there were lots of examples for integrating it with LangChain in the [LLM eva…
-
### Feature summary
HawkAI_preps
### Feature description
**#1**
Voice assistant to ask questions based on required skills or experience.
**#2**
Difficulty levels based mock evaluations.
### M…
-
### System Info
NVIDIA A100 40 GB
### Who can help?
@byshiue @ka
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [X] An officially supported task in…