ffn Search Results - Githubissues

1000+ results
for ffn

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

aju22/LLaMA2 #1

SiLU FFN

https://github.com/aju22/LLaMA2/blob/5716de40720123bf03013f3e08673a7e0feb53ba/model.py#L216 in LlaMA2 source code, they obtain 'X_V' with the origin 'x', instead of 'swish'

Ricardokevins updated 6 months ago
2
apple/coremltools #2367

Extremely Long Loading with default compute_units

## 🐞Describing the bug I'm experiencing extremely long loading times when using the MLModel API to load a converted Core ML model. The loading process hangs indefinitely. When changing compute_units …

baicenxiao updated 1 week ago
2
pytorch/pytorch #136861

Error when calling multiple backward passes on FSDP model

### 🐛 Describe the bug When trying to add FSDP to our training code base that includes a pipelining scheme I encountered an issue if forward and backward passes are no longer interleaved but instead …

Marks101 updated 1 month ago
2
kyegomez/MambaTransformer #14

[BUG] layer norm called multiple times with same parameters

In the module: `MambaTransformer/mamba_transformer`, you execute the following in `class MambaTransformerblock`: ```python # Layernorm self.norm = nn.LayerNorm(dim) def forwa…

erlebach updated 3 weeks ago
1
lucidrains/nGPT-pytorch #10

About "norm_dim_in" in self.to_out

Thanks for the great work. I notice that in the Attention and FFN, the output matrix (i.e., self.to_out) is normalized differently along the first dimension instead of the last dimension (normalizing …

buyeah1109 updated 4 days ago
22
microsoft/BitNet #89

returned non-zero exit status 3221225477

I got this problem when I use it: ``` (bitnet-cpp) C:\Users\m.rahamneh\Desktop\GP\BitNet>python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s INFO:root:Compiling the code using …

m7mdhka updated 4 days ago
5
TransformersWsz/UMGF #12

测试评估问题

您好我想问问训练好模型后需要去测试python ddp_mmner.py --do_test --txtdir=./my_data/twitter2015 --imgdir=./data/twitter2015/image --ckpt_path=./ddp_mner.pt --test_batch_size=32，这段测试代码中需要ddp_mner.pt文件，我想问问这个文件是在哪里呢？好像只有…

xiaoding06 updated 2 weeks ago
3
microsoft/onnxruntime #22532

DistilBERT model inference failure using ONNX Runtime QNNExe…

Description: When running inference on the distilbert-base-uncased model using the NPU on Snapdragon® X Elite (X1E78100 - Qualcomm®) through ONNX Runtime's QNNExecutionProvider, the model fails to inf…

sean830314 updated 1 week ago
4
Haiyang-W/TokenFormer #4

Use of llama2 or llama3 as baseline?

In your writeup you mention following Karpathy's baseline recipe for training the gpt-2 architecture. Did you also try instead using his (or other) baseline recipes for training and then replacing lla…

pjj updated 1 day ago
3
unslothai/unsloth #341

Failed at model.save_pretrained_gguf

I use the model: https://huggingface.co/taide/TAIDE-LX-7B-Chat to fine-tune, but always got the error. training is OK, but model.save_pretrained_gguf failed. ==((====))== Unsloth: Fast Llama pat…

ch-tseng updated 1 month ago
8

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for ffn

1000+ results
for ffn