-
https://github.com/aju22/LLaMA2/blob/5716de40720123bf03013f3e08673a7e0feb53ba/model.py#L216
in LlaMA2 source code, they obtain 'X_V' with the origin 'x', instead of 'swish'
-
## 🐞Describing the bug
I'm experiencing extremely long loading times when using the MLModel API to load a converted Core ML model. The loading process hangs indefinitely. When changing compute_units …
-
### 🐛 Describe the bug
When trying to add FSDP to our training code base that includes a pipelining scheme I encountered an issue if forward and backward passes are no longer interleaved but instead …
-
In the module: `MambaTransformer/mamba_transformer`, you execute the following in `class MambaTransformerblock`:
```python
# Layernorm
self.norm = nn.LayerNorm(dim)
def forwa…
-
Thanks for the great work. I notice that in the Attention and FFN, the output matrix (i.e., self.to_out) is normalized differently along the first dimension instead of the last dimension (normalizing …
-
I got this problem when I use it:
```
(bitnet-cpp) C:\Users\m.rahamneh\Desktop\GP\BitNet>python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s
INFO:root:Compiling the code using …
-
您好我想问问训练好模型后需要去测试python ddp_mmner.py --do_test --txtdir=./my_data/twitter2015 --imgdir=./data/twitter2015/image --ckpt_path=./ddp_mner.pt --test_batch_size=32,这段测试代码中需要ddp_mner.pt文件,我想问问这个文件是在哪里呢?好像只有…
-
Description: When running inference on the distilbert-base-uncased model using the NPU on Snapdragon® X Elite (X1E78100 - Qualcomm®) through ONNX Runtime's QNNExecutionProvider, the model fails to inf…
-
In your writeup you mention following Karpathy's baseline recipe for training the gpt-2 architecture. Did you also try instead using his (or other) baseline recipes for training and then replacing lla…
-
I use the model: https://huggingface.co/taide/TAIDE-LX-7B-Chat to fine-tune, but always got the error. training is OK, but model.save_pretrained_gguf failed.
==((====))== Unsloth: Fast Llama pat…