How can we optimize the Model Inference time. Single NLQ taking more than a 45seconds.

RUCKBReasoning / RESDSQL

The Pytorch implementation of RESDSQL (AAAI 2023).

MIT License

230 stars 58 forks source link

Hi, here are some suggestions:

Keep the model in GPU memory instead of releasing it after each use, as releasing it each time will result in a lot of I/O time.
Using GPU to inference instead of CPU.
Try to reduce the beam size during decoding.
Try using tools that can accelerate model inference, such as TensorRT and FasterTransformer.

However, my suggestions are quite general, so they may not necessarily address your specific issue. If you want to achieve deeper acceleration, I recommend that you try to identify the code blocks that consume the most time.

RUCKBReasoning / RESDSQL

How can we optimize the Model Inference time. Single NLQ taking more than a 45seconds. #53