-
## Current Situation
Flatcar OS does not include the `lz4` and `zstd` kernel modules, which are necessary for enabling certain compression features, particularly in the case of `zram` (a compressed…
-
The stack controller provisions a workspace on demand, and it is fine to delete the workspace after sync was successful. Obviously the user is trading off performance/efficiency.
Similar to the "recl…
-
I am a bit of a noob when it comes to transformers. If I want to encode a batch of `N` sequences of maximum length `L`, my understanding is that I do something like this:
```
from x_transformer im…
-
![image](https://github.com/user-attachments/assets/378dc115-4841-45d5-85d2-c32756a144dd)
-
We need to review and improve the Zero-Copy compliance in the Ark Project, with a focus on the Sana module.
Some areas where improvements are needed include:
1. Handling LifeTimes parameters effe…
-
Hi, in this part,
https://github.com/yixinL7/BRIO/blob/135f0e5cc5671fe4faa45ff3e05969969686419a/modeling_bart.py#L1863-L1869
since the `encoder_hidden_states` and `attention_mask` won't be changed …
-
We are running `cargo build -F cuda --release` on `Ubuntu 22.04.4 LTS`. This command takes too long, more than 30 minutes.
especially on the last few risc0 related packages.
Do you have any good s…
-
When we use larger model (e.g. VGG19) and larger batch size (e.g. 256), the original version of `_update_fisher_params` will easily deplete the GPU memory (over 24GB). Here, I propose a practical impr…
-
**Why is it that when using a quantitative model for inference, the TTFT optimization is not obvious, but the overall inference efficiency is improved a lot? At the same time, the inference efficiency…
-
In this section, we focus on enhancing the performance, efficiency, and usability of our application. Optimization is key to delivering a seamless user experience and ensuring that our software runs s…