hidden-causal Search Results

1000+ results
for hidden-causal

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

occam-ra/occam #18

Next steps and directions: Speeding up bottlenecks, adding f…

I don't know if you all have other things in mind but these are some of the things one could do once `occam-ra` is readied for general use. These are some of the things I have on my wish list. There …

venkatachalapathy updated 5 years ago
2
NVIDIA/TransformerEngine #1119

TransformerEngine FP8 is slower & more memory intensive than…

I'm running some benchmarks on TransformerEngine MHA FP8 versus FlashAttention MHA FP16. However, I'm consistently getting that TE FP8 is not only slower by 50-60% than FlashAttention; it also uses mu…

darius-lam updated 1 week ago
4
intel-analytics/ipex-llm #10177

failed to inference latest version (67cf22a4e6809edb7308dd0a…

In latest commit, https://huggingface.co/mosaicml/mpt-7b/commit/67cf22a4e6809edb7308dd0a2ae2c1ffb86f4984, BigDL throws below error when generate text. INFO 2024-02-20 06:41:05,962 proxy 172.17.0.2 …

jiafuzha updated 6 months ago
9
unslothai/unsloth #680

sliding_window shouldn't be applied when flash_attn not inst…

I've been finetuning unsloth/Phi-3-mini-4k-instruct-bnb-4bit with a T4, which doesn't support flash attention, so I don't have it installed. During evaluation, I've been running into the following …

rossbm updated 2 months ago
2
pytorch/pytorch #130486

RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlInit_v2_…

### 🐛 Describe the bug ''' checkpoint_path = './llama_relevance_results' training_args = transformers.TrainingArguments( #remove_unused_columns=False, # Whether or not to automatically r…

Zzv213 updated 1 month ago
2
mosaicml/llm-foundry #1046

Setting Dropout in MPT Prefix-LM after Exporting to HuggingF…

## Context We have a MPT MoD prefix-lm trained on llm-foundry and then exported to HuggingFace (via your scripts). For some fine-tuning experiments with the HF model, I tried to set up dropout. …

timsteuer updated 5 months ago
2
Maelic/SGG-Benchmark #20

About informative_sg.json

i found the datasets need informative_sg.json,but i don't know how can i get this.Could you help me ？And what is the purpose of this JSON file?

jiugexuan updated 2 weeks ago
2
pytorch/pytorch #112015

op scaled_dot_product_attention case different results

### 🐛 Describe the bug when I set the `dropout_p=0.0`, the result is different. But `dropout_p=-1`, the result is same. Maybe the op scaled_dot_product_attention has some bug. Please fix it, thank…

Dengshunge updated 7 months ago
4
facebookresearch/denoiser #126

Very long training time for DNS 2020

We are trying to reproduce the dns64 Demucs model result from scratch. We have two 2080 Ti GPUs, but the largest batch size we are able to make is 14 and the model takes 5 hours per epoch. Is it su…

jhkonan updated 2 years ago
1
HazyResearch/based #3

Type Error in GPTLMHeadModel

I am having a go at running inference and evaluation for this model, and running into a TypeError in `GPTLMHeadModel`: ``` In [1]: import torch ...: from transformers import AutoTokenizer …

axelmagn updated 5 months ago
7

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for hidden-causal

1000+ results
for hidden-causal