-
Hi,
thank you for the interesting work!
I am trying to reproduce the results for LLaMA-2-7b on LIMAEval for the discard method.
I ran the evaluation script after generating with the release mode…
-
FYI: Function calling is now available on gpt-4-0613 and gpt-3.5-turbo-0613, which should make tools a lot more reliable:
https://openai.com/blog/function-calling-and-other-api-updates
-
When I use my finetuning model i get this error
`Failed to calculate number of tokens, falling back to approximate count Error: Unknown model`
i took a look in the code and the `getEncodingNameForMo…
-
# DEPRECATE: rims 는 16k-0613 chatgpt, / 나머지는 0613 chatgpt 라서 꼬투리잡힐 것 같음. 맨 아래에 모두다 16k로 통일한 결과 첨부.
아래 모든 실험에 적용되는 옵션: greedy decoding
* Temperature = 0
* seed = 777
* rims prompt used = cot2p2c.pa…
-
Adding support for more mypy resultant errors would help in less feedback iterations for a completion.
I analyzed both `level_1__ft__mypy_signature_5_steps__1_choice.yaml` and `level_1__ft__mypy_je…
-
**例行检查**
[//]: # (方框内删除已有的空格,填 x 号)
+ [ ] 我已确认目前没有类似 issue
+ [ ] 我已确认我已升级到最新版本
+ [ ] 我已完整查看过项目 README,尤其是常见问题部分
+ [ ] 我理解并愿意跟进此 issue,协助测试和提供反馈
+ [ ] 我理解并认可上述内容,并理解项目维护者精力有限,**不遵循规则的 issue 可能…
-
Per the documentation [here](https://platform.openai.com/docs/api-reference/chat/object), the OpenAI API should return a system_fingerprint. However, when calling any model that is not gpt-3.5-turbo-1…
-
Hi!
Thank you for your excellent work on the LLM evaluation! I'm inspired to create a French version of MT-Bench.
Currently, I'm in the process of generating reference answers for tasks in the m…
-
Hello Authors of SmartPlay,
Thank you for providing this nice testbed.
I am trying to replicate the scores for Table 2, follow your env setting on git repo.
eg. For RockPaperScissorBasic (RPS) g…
-
Feature Request:
Add a combo box for model selection into the settings.
Why combo box? So that the user can not only select predefined models like 'gpt-3.5-turbo' or 'gpt-4' but also specify it via t…