linear-attention-model Search Results

1000+ results
for linear-attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

TransformerLensOrg/TransformerLens #710

[Proposal] Add MVP Support For 1-2 Models Per-Modality

Is this out of scope? I hope not, would be nice to have a one-stop shop for interpretability tooling. ### Proposal It should be easy to get the most bare-bones interpretability research off the…

4gatepylon updated 6 hours ago
2
huggingface/transformers #32672

fp16 support for grounding dino

### Feature request Currently, if fp16 is used with grounding dino via https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino, there is an error of the following: ``` ... Fi…

Benjamin-Tan updated 1 month ago
1
state-spaces/mamba #229

How does mamba support cross attention?

lqniunjunlper updated 2 months ago
9
abetlen/llama-cpp-python #1581

Trying to load llm model using llama cpp python with GPU sup…

# Description When attempting to set up llama cpp python for GPU support using CUDA toolkit, following the documented steps, the initialization of the llama-cpp model fails with an access violation…

Sanjit0910 updated 1 day ago
9
JulianSampels/OntoMatch #21

dh-benchmark

I'm trying to use the DH benchmark from this year's OAEI. I get the error below. Do you have any idea what is going wrong? I also included the `config.json` and `configMatcher.json`. To test I only …

FelixFrizzy updated 4 days ago
1
abetlen/llama-cpp-python #1535

llama-cpp-python not using GPU on colab

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [ ] I am running the latest code. Development is very rapid so there are no tagged versions as of…

amida47 updated 1 week ago
2
mbzuai-oryx/LLaVA-pp #19

Training Issue

## Environment - Platform: Debian Linux - GPU: A100 - Torch: '2.1.2+cu121' - Transfomers: '4.37.2' ## Issue I'm seeing random and sudden loss spikes during training, if there is a simpler wa…

DevonPeroutky updated 5 months ago
1
facebookresearch/fairseq #3405

NACRF Implementation

## 🐛 Bug: Opening a bug. But really, it is a question. In the NACRF implementation, (specifically the Fast Structured Decoding for Sequence Models paper implementation), I do not see the Multi-head po…

jroshanucb updated 3 years ago
3
sksq96/pytorch-summary #157

Summary of BERT model.

I'd like to get a summary for the following BERT model ``` model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased') ``` What shape or shapes should I use? how to …

apivovarov updated 1 year ago
3
NVIDIA/TransformerEngine #1043

Error pre-training BERT

Hi guys, I am following the Megatron-LM example to pre-train a BERT model but I'm getting this error: ``` [rank0]: Traceback (most recent call last): [rank0]: File "/root/Megatron-LM/pretrai…

fabiancpl updated 1 month ago
1

上一页 1...23 24 25 26 27 28 29...100 下一页

1000+ results for linear-attention-model

1000+ results
for linear-attention-model