-
### What should we add?
[1] proposed an efficient way to synthesize the following special pattern of two-body terms (`IZZ` and `ZZI`) and three-body term (`ZZZ`) (Fig. 2 [1]).
It can halve the number…
-
Hi,
Thanks for your great work.
I try to reproduce the results of offline dpo and offline simpo and I found the reproduced resltus are better the results in the paper. For example, for the resul…
-
# RAG Evaluations MVP
## Description
LFAI needs a framework for evaluations in order to:
- Validate the efficacy of RAG at all stages
- Make model recommendations for various scenarios
- Establish a …
-
Hi there,
I am wondering does the llm-as-a-judge evaluation from LangSmith support customized my own model as a judge?
I wish to develop my custom prompts for my own judge model through langsmith. …
-
An expert in TPU compiler writing can potentially introduce sampling techniques into programs for specific purposes. Here's a breakdown of the concept:
**Sampling for TPU Programs:**
* **Expert-…
-
### Summary
MDN's new "ai explain" button on code blocks generates human-like text that may be correct by happenstance, or may contain convincing falsehoods. this is a strange decision for a techn…
-
### Describe the feature
Do I need to set temperature = 0 when I try to use llm as judge. Otherwise, every time the score is different.
### Will you implement it?
- [ ] I would like to implement th…
-
hi, i've followed the steps indicated in `reproducing-results.md`. For generating the greedy results i did run only math and gsm8k with
```ns eval \
--cluster=local \
--model=/workspace/…
-
Now that we've got about 100k transcripts in our oral argument collection, perhaps a next step would be to add sentiment analysis. I think this is pretty easy stuff these days either through an AI cal…
-
Running vllm according to instructions. Docker segfaults at startup, so I'm running straight on the machine.
Starting server with the following shell script. As you can see I've tried to turn max…