attention-architecture Search Results

1000+ results
for attention-architecture

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

meta-llama/llama-models #172

lm_head weight of Llama3.2_3B_instruct model

Hello, I find that theres no lm_head weight in model checkpoints（.safetensors）. How does model load weight for the Linear Layer of lm_head ?

Watebear updated 2 weeks ago
1
huggingface/transformers #28005

Open to contribution: adding `torch.nn.functional.scaled_dot…

### Feature request In [`Transformers 4.36`](https://github.com/huggingface/transformers/releases/tag/v4.36.0), we started adding native support of [torch.nn.functional.scaled_dot_product_attention](…

fxmarty updated 3 weeks ago
35
ZZZHANG-jx/DocRes #17

about position embedding

why doesn't the architecture need position embedding?

zhaozhaoooo updated 1 month ago
1
haoliuhl/ringattention #21

Llama 3 ring attention implementation for inference

Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do I start to break t…

joshpopelka20gmail updated 2 months ago
1
Romanitho/Winget-AutoUpdate #739

MSI installer - redundancy in public properties

https://github.com/Romanitho/Winget-AutoUpdate/blob/a63a3957978a14941f0b318afe9052d4988b84cd/Sources/Wix/build.wxs#L26 We are expanding the `POWERSHELLEXE` property during `AppSearch` action, but w…

AndrewDemski-ad-gmail-com updated 1 week ago
3
xoreaxeaxeax/movfuscator #48

I want to build a corresponding movfuscator on the ARM archi…

I want to apply this project to the ARM architecture. What should I pay attention to, how should I plan the entire process, and can the author provide some feedback

dnffabs updated 1 month ago
3
src-d/ml-core #32

Switch from BiLSTM to the modern attention architecture

Our current NN splitter is based on BiLSTM, which has problems with performance. We should leverage the recent advancements in deep learning and implement the new attention-based (seq2seq-like?) archi…

vmarkovtsev updated 9 months ago
3
ultralytics/ultralytics #15737

Module implementation

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussi…

p-dot-max updated 1 month ago
1
rzane/docker2exe #10

This software use tutorial

This software use tutorial 1、go dependent language 2、Dependent make FAQ 1、Note go language 2、Pay attention to your operating system and architecture

xiejizhe updated 1 month ago
1
26hzhang/neural_sequence_labeling #6

which attention architecture is used in NER?

I want to understand how you used attention in NEr task, any paper or article which explains this? Thanks

omerarshad updated 5 years ago
7

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for attention-architecture

1000+ results
for attention-architecture