-
**Describe the bug**
The `n_params` counts calculated [here](https://github.com/neelnanda-io/TransformerLens/blob/f5a7d455546a88cfdfb26e781d5bd6447e8243eb/transformer_lens/HookedTransformerConfig.py#…
-
- [ ] Different Types of activation function
-
In your writeup you mention following Karpathy's baseline recipe for training the gpt-2 architecture. Did you also try instead using his (or other) baseline recipes for training and then replacing lla…
-
Hello 我在Mac M1上面运行项目一直报错这个错误 提示无法加载CUDA的内容 麻烦看下如何解决
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.1 with CUDA 1106 (you have 2.0.1)
Pytho…
-
# 🐛 Bug
## Command
## To Reproduce
Steps to reproduce the behavior:
Theres a issue everytime i delete my folder, and start fresh the python numner changes, from 3.9.13, 10.6, 10.11,…
-
## Description
Consider adding additional FusedCrossEntropyLoss kernel to FOAK set of kernels given the additional improvement seen using it in earlier tests (See Background below).
Considerati…
-
### Feature request
Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag
### Motivation
Liger (Linkedi…
-
Already download the model.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu)
Python 3.10.11 (you …
-
I train a tiny LLAMA2 model with `pretrain_llama2_distributed.sh` and try to convert the model to `huggingface transformers` format with `tools/convert_checkpoint/deepspeed_to_transformers.py`. Then i…
-
在运行CUDA时报告:
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.0+cu118 with CUDA 1106 (you have 2.0.1+cu118)
Python 3.9.16 (you have 3.10.12)
…