-
**Is your feature request related to a problem? Please describe.**
I was taking a look into Karapthy's lama3.c single file and found something similar in java
https://github.com/mukel/llama3.j…
-
Why JAX? Read this: https://neel04.github.io/my-website/blog/pytorch_rant/
The deliverable is a `JAXInferenceEngine` that can run Llama.
-
### Requested feature
First of all, congrats on the amazing work !
I have two improvement ideas that might help simplify using this library in a wider range production workloads:
* Supp…
-
Hey man!
do you know whether there is already someone or some group working on 'cloning' this for a local-only solution?
Thanks!
-
1. 我在训练的时候,将图像大小由640改为了1280,修改了图像大小相关的适配为1280,其他的没改,成功训练完毕
2. 训练好之后保存为best_stg2.pth,然后用官方的代码和命令转成best_stg2.onnx
3. 再将onnx用官方的命令转成engine文件,trtexec --onnx="best_stg2.onnx" --saveEngine="best_stg2.engi…
-
- it should automatically detect the best device to run on
- We should require 0 manual configuration from the user, by default llama.cpp for example requires specifying the device
-
Hi,
I tried the latest commit of main b6945224fadc79ff11f3c58465380a6d4294962e on a V100 gpu using cuda 12.2 and 12.4
on a V100 GPU and i got the following error
```bash
epo_revision': 'main'…
-
Hi, would like to ask if there is plan to support mlx inference engine.
It is the current fastest engine for mac os and i dont think any of the existing other library that support it for format cons…
-
Reading engine from file /content/engine/yolo11x_fp16.engine
Total Inference Time : 17.17
Total Frame processed : 750
Average Inference FPS : 43.69
Total Feature Time : 75.53
Average feature FPS : 9.9…
-
## Description
I am encountering performance bottlenecks while running multi-threaded inference on high-resolution images using TensorRT. The model involves breaking the image into patches to manage…