swiglu Search Results - Githubissues

822 results
for swiglu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

TransformerLensOrg/TransformerLens #448

[Bug Report] Fix `n_params` counts

**Describe the bug** The `n_params` counts calculated [here](https://github.com/neelnanda-io/TransformerLens/blob/f5a7d455546a88cfdfb26e781d5bd6447e8243eb/transformer_lens/HookedTransformerConfig.py#…

ArthurConmy updated 1 year ago
2
shub-bioinfo/Machine-Learning #1

Activation Functions

- [ ] Different Types of activation function

shub-bioinfo updated 5 months ago
1
Haiyang-W/TokenFormer #4

Use of llama2 or llama3 as baseline?

In your writeup you mention following Karpathy's baseline recipe for training the gpt-2 architecture. Did you also try instead using his (or other) baseline recipes for training and then replacing lla…

pjj updated 4 days ago
6
Vchitect/SEINE #21

xFormers can't load C++/CUDA extensions. xFormers was built …

Hello 我在Mac M1上面运行项目一直报错这个错误提示无法加载CUDA的内容麻烦看下如何解决 WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.0.1 with CUDA 1106 (you have 2.0.1) Pytho…

RefrainHan updated 10 months ago
1
facebookresearch/xformers #971

C++/CUDA extensions. xFormers was built for: PyTorch 2.1…

# 🐛 Bug ## Command ## To Reproduce Steps to reproduce the behavior: Theres a issue everytime i delete my folder, and start fresh the python numner changes, from 3.9.13, 10.6, 10.11,…

reponum8 updated 9 months ago
2
foundation-model-stack/fms-acceleration #76

Introduce Liger Fused Cross Entropy Kernel to FOAK Plugin

## Description Consider adding additional FusedCrossEntropyLoss kernel to FOAK set of kernels given the additional improvement seen using it in earlier tests (See Background below). Considerati…

achew010 updated 6 days ago
3
huggingface/transformers #32861

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to H…

### Feature request Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag ### Motivation Liger (Linkedi…

JasonZhu1313 updated 1 month ago
4
TMElyralab/MuseV #130

UnicodeDecodeError: 'gbk' codec can't decode byte 0xb9 in po…

Already download the model. WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu) Python 3.10.11 (you …

Song367 updated 2 months ago
2
microsoft/Megatron-DeepSpeed #211

[BUG] Converting LLAMA2 to hf format raise error: KeyError: …

I train a tiny LLAMA2 model with `pretrain_llama2_distributed.sh` and try to convert the model to `huggingface transformers` format with `tools/convert_checkpoint/deepspeed_to_transformers.py`. Then i…

SefaZeng updated 10 months ago
10
facebookresearch/xformers #899

xFormes没有加载CUDA，无法使用CUDA

在运行CUDA时报告: WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.0+cu118 with CUDA 1106 (you have 2.0.1+cu118) Python 3.9.16 (you have 3.10.12) …

xiahongyu939 updated 11 months ago
5

上一页 1...2 3 4 5 6 7 8...83 下一页

822 results for swiglu

822 results
for swiglu