-
python web_demo_mm.py -c "/data/shared/Qwen/models" --share --server-name 0.0.0.0 --server-port 80
/usr/local/lib/python3.8/dist-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarn…
-
**Describe the proposal**
List the GPU kernels with changed register usage as a comment in each PR.
This can done by using the `--ptxas-options=v` compiler flag, then parsing the compiler output with…
-
I am a beginner. I saw mention of blocksize in section 5.5 of the user manual (DualSPHysic):
“An important novelty since v4.0 is the determination of the optimum Blocksize for CUDA kernels that exc…
-
**Describe the bug**
I am trying to run the non-persistent example given for mistralai/Mistral-7B-Instruct-v0.3 on a RTX A6000 GPU (on a server) so compute capability is met, ubuntu is 22.04, CUDA to…
-
Using fortran style 1D indexing on the parent, with any required assertions done upstream, might be easiest for some kernels. E.g.:
```julia
function Base.copyto!(
dest::IJFH{S, Nij},
bc:…
-
### Anything you want to discuss about vllm.
in qwen2vl's mrope imple, vllm decide whether input positions is for multimodal with
![image](https://github.com/user-attachments/assets/6dfc96d9-5162-…
-
Base methods, such as `accumulate!`, `mapreduce` have support for `dims` kwarg.
Is there a plan for adding such support here?
We can then replace other kernels from AMDGPU/CUDA with AK implementatio…
-
When I used fast_cross_entropy_loss instead of torch.nn.CrossEntropyLoss, this error happend.
`File "/mnt/fs/user/xingjinliang/unsloth/unsloth/kernels/cross_entropy_loss.py", line 318, in fast_cross…
-
- Rename `batch_size` to `frame_packet` (michael)
- Rename `insert_wait_frames` to `insert_wait_frame_packet`
- Most of the function (cuda kernels, wrappers, ...) in pipe takes an input buffer and a…
-
> Port training CUDA kernels from these librarys, and automatically replace modules in an existing 🤗 `transformers` model with their corresponding CUDA kernel version.
Check out the following op…