hidden-causal Search Results

1000+ results
for hidden-causal

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

IBM/text-generation-inference #104

Problem loading granite-3b in small MIG partitions

**Describe the bug** There is a misleading error when deploying models in small MIG partitions **To Reproduce** - Deploy TGIS in Openshift AI. - Enable MIG (1g.5gb partitions). - Deploy grani…

ccamacho updated 1 month ago
1
lucidrains/flash-attention-jax #7

fix compatibility with jax transformations

currently impossible to use `flash_attention` within a function that will use gradient checkpointing minimal example to reproduce: ```py b = 3 lq = 16 lkv = 17 h = 5 d = 19 q = jax.random.…

GallagherCommaJack updated 2 years ago
28
state-spaces/mamba #556

Can't train mamba2 from scratch with HF Trainer

I'm trying to train mamba2 130m from scratch. ``` config = Mamba2Config( vocab_size=len(tokenizer.vocab), n_positions=10, n_embd=768, …

npkanaka updated 3 weeks ago
21
rstudio/rstudio #9120

auth-pam-helper-path not working

### System details RStudio Edition : Server RStudio Version : 1.4.1106 OS Version : unknown R Version : unknown ### Steps to reproduce the problem From customer fo…

gtritchie updated 2 years ago
2
luyug/mores_plus #1

The code is incomplete

Thank you for publishing the code. However, it seems to be incomplete. For example, there is no code/guidelines regarding encoding queries & documents, no hyperparameters (such as chunk size) are prov…

searchivarius updated 1 year ago
5
alibaba/Pai-Megatron-Patch #354

qwen2.5转换脚本转换时报错

``` [rank0]: Traceback (most recent call last): [rank0]: File "Pai-Megatron-Patch-0925/toolkits/model_checkpoints_convertor/qwen/hf2mcore_qwen2_dense_and_moe_gqa.py", line 924, in [rank0]: m…

enze5088 updated 1 month ago
1
lucidrains/DALLE-pytorch #418

Bad result with vqgan

Hi, I am using VQGAN on the MSCOCO training dataset (also tried adding Visual Genome to construct a 1 Million dataset), but got a bad result. The pixels are wired. Here are my settings, …

shizhediao updated 2 years ago
5
alibaba/Megatron-LLaMA #16

2节点训练13B LLaMA模型效率只能达到840 token/sec/GPU

基于2台A800x80G训练13B LLaMA模型发现效率只能达到840 token/sec/GPU，不知道是什么原因，详细配置如下： --tensor-model-parallel-size 4 \ --pipeline-model-parallel-size 1 \ --sequence-parallel \ --distributed-timeout…

YaboSun updated 11 months ago
13
intel-analytics/ipex-llm #12143

Undefined symbol on ipex 2.3.110+xpu

Hello, I'm trying ipex-llm 2.2.0b20240927 with pytorch ipex 2.3.110+xpu, and it failed with following error: ``` ERROR azarrot.backends.common:common.py:323 An error occurred when generating te…

notsyncing updated 1 month ago
1
davidpicard/HoMM #1

Very cool idea!!! How can one contribute?

I saw your post on twitter about your new method for attention approximation and I think this is a cool idea! But can you clarify a few things? **Approximation Method:** Is your method genuinely appr…

jeffhernandez1995 updated 9 months ago
5

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for hidden-causal

1000+ results
for hidden-causal