-
There's mention of scored steps for the solutions for each of the 100 Questions, which also accompany the corresponding questions in the json files.
Are their purpose only to serve as a better evalua…
-
## Methodology Discussion
SentinelGuard is supposed to integrate Large Language Model Services (LLMs), Machine Learning & Deep Learning (ML&DL) methods, and Rule-based filters to identify intrusion…
zhsh9 updated
4 weeks ago
-
step1
pretrain_projector_image_encoder.sh
step2
pretrain_projector_video_encoder.sh
step3
finetune_dual_encoder.sh
step4
eval/vcgbench/inference/run_ddp_inference.sh
step5
eval/vcgbench/gpt_e…
-
Hi, Thanks for such a robust work!
We have supported ArenaHard dataset in Opencompass now, OpenCompass is an evaluation platform that can partition tasks and support different model inference backend…
-
SFT数据都是基于问题和表格进行代码生成的,训练出来的专业模型泛化性不太高,很难跟随简单提示词直接固定输出格式,测试时发现很多问题SFT模型能进行拒绝,但无法按格式输出yes或者no。能否考虑将提示词修改成类似于SFT数据的形式,比如拒绝测试变成模型生成代码就算接受,不生成就算拒绝之类的?
-
### Describe the issue
### LMSys.org
Large Model Systems Organization (LMSYS Org) is an open research organization founded by students and faculty from UC Berkeley in collaboration with UCSD and C…
-
### Question Validation
- [ ] I have searched both the documentation and discord for an answer.
### Question
I followed this [article](https://docs.llamaindex.ai/en/stable/examples/agent/multi_docu…
-
### Problem & Motivation
There is a huge wave of interest around high accuracy Q&A, such as via Retrieval Augmented Generation (RAG). RAG accuracy is largely driven by how well vector search is abl…
-
### Reference code
- Llama-recipes code
[https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c67345d897c0eb6529eba076e8b8](https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I am trying to use the built-in capabilities of llamaindex to evaluate the correctness o…