-
- [x] [Jeremy Twitter thread](https://twitter.com/jeremyphoward/status/1688673397138690048?s=46&t=aOEVGBVv9ICQLUYL4fQHlQ)
- Training set 200 science multiple choice questions autogenerated using GPT …
-
The idea of this paper is really great and much easier to understand than ppo.
However, if there are six candidate responses, then at least batch size should be equal to 6 when calculating loss once.…
-
### Feature request
Extend `tokenizer.apply_chat_template` with functionality for training/finetuning, returning `attention_masks` and (optional) `labels` (for ignoring "System" and "User" messages d…
siddk updated
2 months ago
-
随着大模型的飞速发展,各种相关的 benchmark 也层出不穷,开此 issue 收集相关工作,以促进思考与后续可能的工作~
**1、BIG-bench(Google)**
Jeff Dean 等人架构的 PaLM 模型中,同时提出了 BIG-Bench 大模型专用基准,与其他算法进行多项任务测试。
**2、、HELM(Stanford)**
- 论文链接:https:/…
-
# URL
- https://arxiv.org/abs/2405.05904
# Affiliations
- Zorik Gekhman, N/A
- Gal Yona, N/A
- Roee Aharoni, N/A
- Matan Eyal, N/A
- Amir Feder, N/A
- Roi Reichart, N/A
- Jonathan Herzig…
-
The integration of Gemma.cpp into KataGo can be used to help it explain the meaning of each move. is it possible ?
Gemma: https://github.com/google/gemma.cpp
-
Torch FP8 data type may be released at version 2.1, and Jax FP8 supported has already being released.
-
https://www.sarvam.ai/blog/announcing-openhathi-series - Bilingual LLMs frugally
> The OpenHathi series of work at Sarvam AI is to make contributions to the ecosystem with open models and datasets to…
-
Which configuration file can reproduce the 54.x effect of the paper?
-
### Summary
# Motivation
WasmEdge is a lightweight inference runtime for AI and LLM applications. Build specialized and finetuned models for WasmEdge community. The model should be supported by Wa…