-
### 🐛 Describe the bug
Hello,
I'm using the QuantTrainModule to train a MobileNetV2 model (using the MobileNetV2 class in this repo), and the quantized checkpoints have 32-bit floating-point weigh…
-
Is CUDA 12.1 support coming or in the works? Just curious since faster-whisper keeps looking for cublas11.dll...and although I don't use cudnn, I'm assuming that would be another aspect to consider? …
-
Discussion: https://www.reddit.com/r/MachineLearning/comments/hu7lyt/p_yolov4tiny_speed_1770_fps_tensorrtbatch4/
Full structure: [structure of yolov4-tiny.cfg model](https://netron.app/?url=https:/…
-
The inference speed of the int 8 quantization version of SDXL is much slower than that of fp16. I am runing trt9.3 sdxl demo and here is the result. (I changed shape to 768x1344 manually)
fp16 : pyt…
-
I have a Spiking convolutional neural network. It uses the Leaky(Leaky Integrate and Fire) neuron from [SNNTorch ](https://snntorch.readthedocs.io/en/latest/snntorch.html)library as activation functio…
-
when I run run.sh, the program faced Segmentation fault, could you give some hints?
GDB information:
```
(gdb) bt
#0 0x000000000041323d in nvdla::TensorDescListParser::buildList (this=0x682320)…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and fou…
-
Hi,
I am able to run SSD MobileNetV2 and CenterNet MobileNetV2 (boxes prediction) on my android device. When I compare inference speed of the models on my android device I get below results:
inf…
-
### 🚀 The feature, motivation and pitch
VLLM has announced support for running llama3.1-405b-fp8 on 8xA100. This is the [blog](https://blog.vllm.ai/2024/07/23/llama31.html)
Does vllm support run…
-
Given existing support for GPT-J and its rotary embeddings, is LLaMA supported as well? Huggingface just shipped their implementation: https://github.com/huggingface/transformers/commit/464d4207756538…