-
I noticed that the implementation of computing attention in the `forward` method of the `KVCompressAttention` class includes a conditional check https://github.com/hpcaitech/Open-Sora/blob/476b6dc7972…
-
Hi,
Thank you **so much** for your great work on this project.
I am a computer science undergraduate working on my school project.
Anyway, I want to ask how to use the trained model easily.
…
-
### System Info
```shell
deepspeed 0.14.4+hpu.synapse.v1.18.0
optimum-habana 1.14.0
docker image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-ins…
-
### What is the issue?
Description:
We are experiencing repeated GPU VRAM recovery timeouts while running multiple models on the ollama platform. The GPU in use is 2x NVIDIA RTX A5000. The system …
-
From this simple example:
```julia
using JuMP
using HiGHS
using Profile
using PProf
function add_var()
model = direct_model(HiGHS.Optimizer())
@variable(model, 1
-
- Paper name: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
- ArXiv Link: https://arxiv.org/abs/2401.16380
To close this issue open a PR with a paper report using…
-
Hi,
I noticed that the repository currently lacks support for the InternLM2.5-7B (1.8B, 20B) model, which may cause compatibility issues or missing steps for users trying to implement it. It would …
-
Efficient Streaming Language Models with Attention Sinks [paper](https://arxiv.org/abs/2309.17453)
These repo has already implemented it:
[attention_sinks](https://github.com/tomaarsen/attention_si…
-
## Model Zoo (we generally first implement USP and then PipeFusion for a new model)
- [ ] [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
- [ ] mochi (we will wait after it …
-
Hi,
I noticed that the repository currently lacks support for the InternLM2.5-7B (1.8B, 20B) model, which may cause compatibility issues or missing steps for users trying to implement it. It would …