swiglu Search Results - Githubissues

822 results
for swiglu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AkihikoWatanabe/paper_notes #888

Llama 2: Open Foundation and Fine-Tuned Chat Models, Hugo To…

# URL - https://arxiv.org/abs/2307.09288 # Affiliations - Hugo Touvron, N/A - Louis Martin, N/A - Kevin Stone, N/A - Peter Albert, N/A - Amjad Almahairi, N/A - Yasmine Babaei, N/A - Niko…

AkihikoWatanabe updated 5 months ago
2
linkedin/Liger-Kernel #48

Unable to use FLCE with FSDP+PEFT+embeddings layers

### 🐛 Describe the bug when trying to train both LoRA layers on the base model and also set modules_to_save on the lora config which makes the embeddings layers trainable (my assumption is it also ap…

winglian updated 2 weeks ago
5
bnabis93/vision-language-examples #14

xFormers-ViT performance degradation in A100 GPU

## Performance degradation in A100 GPU - Vanilla Attention: 3.87ms - Sparse Attention: 9.33ms - Memory Efficient Attention: 6.34ms - Sparse Attention is 2.4x slower than Vanilla Attention - Memor…

bnabis93 updated 1 year ago
5
state-spaces/mamba #345

Error when trying to use Mamba2

``` Traceback (most recent call last): File "test_mambav2.py", line 6, in from mamba_ssm import Mamba File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/__init__…

yxchng updated 1 month ago
25
facebookresearch/xformers #941

Significant performance drop in training

# ❓ Questions and Help I'm new to xformers. I need to use Transformer Encoders to train on a dataset with a very large variation in sample lengths. My original code was: ```python tokens = [token…

Fan-Yixuan updated 8 months ago
4
vllm-project/vllm #6355

[Installation]: Running ohereForAI/c4ai-command-r-v01 with m…

### Your current environment why is it important: This is a prerequisite to the work on enabling troch.compile on vllm, we need to be able to build vllm with nightly so that we can iterate on chan…

laithsakka updated 2 weeks ago
13
lshqqytiger/stable-diffusion-webui-amdgpu #552

[Bug]: --use-zluda uses cpu, --use-directml works fine

### Checklist - [X] The issue exists after disabling all extensions - [X] The issue exists on a clean installation of webui - [X] The issue is caused by an extension, but I believe it is caused by a …

picarica updated 1 week ago
20
ethz-asl/analog_gauge_reader #22

Can not get it to run

I have Anaconda installed on my windows machine, so I only followed the instructions below `Activate conda environment`. So create the new environment, activate it and install the dependencies via cop…

Eheran1 updated 3 months ago
9
Engineer-of-Stuff/stable-diffusion-paperspace #111

xFormers can't load C++/CUDA extensions. xFormers was built …

Looking for directions where to head with the following error that I'm getting all the sudden WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.0+cu12…

Normiegetout updated 12 months ago
2
NVIDIA/TransformerEngine #407

Expected margin of error versus a typical pytorch implementa…

I am currently attempting to port a llama-like model architecture from pure pytorch to TransformerEngine's pytorch classes. However, I have been unable to obtain identical results in certain cases.…

152334H updated 1 year ago
2

上一页 1...6 7 8 9 10 11 12...83 下一页

822 results for swiglu

822 results
for swiglu