-
在微调qwen的时候,大致的流程:
tokenize->nn.embedding->QWenBlock->输出embedding->nn.Linear 输出token计算loss
请求nn.embedding会参与反向传播吗,想在embedding层计算loss优化效果。
如果默认不参与的话,怎么加入nn.embedding的训练呢?
-
### System Info
```shell
optimum-habana 1.7.2
text-generation 0.6.0
text-generation-server 1.0.3
langchain …
-
**Is your feature request related to a problem? Please describe.**
When downloading large files using **`snapshot_download`**, a transient `Read timed out` exception crashes the process and there's…
-
> 用 cli_demo.py
请问在做增量预训练时用的是什么template?
以及在做文本补全的时候只写了一句就不断重复是什么原因啊?
_Originally posted by @huangl22 in https://github.com/hiyouga/LLaMA-Efficient-Tuning/issues/1124#issuecomment…
-
**Describe the bug**
When running step 3 with ZERO stage 3 enabled and lora for both the actor and critic models.
An error was reported, it seems to tell me that bloomz does not support zero3+lora.
…
-
**LocalAI version:**
commit 618fd1d41730ab03f7ac40e2457ea29709756b1f
**Environment, CPU architecture, OS, and Version:**
Macbook Pro M1 Pro 16GB, macOS 12.6
**Describe the bug**
Failure o…
-
`agent_chain = initialize_agent( tools=tools, llm= HuggingFaceHub(repo_id="google/flan-t5-xl"), agent="conversational-react-description", memory=memory, verbose=False)
agent_chain.run("Hi")`
**t…
-
In this branch: https://github.com/huggingface/safetensors/compare/julien-c/js I pushed a proof-of-concept of how, given the simplicity of the format, one can fetch metadata about the weights over sma…
-
**Actor model**: Bloom-1.1b
**Reward model**: Bloom-560m
**Finetuning cmd**:
bash training_scripts/single_node/run_bloom_1.1b.sh /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_superv…
-
### System Info
- `transformers` version: 4.32.0
- Platform: Linux-5.4.143.bsk.8-amd64-x86_64-with-glibc2.31
- Python version: 3.9.2
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3…