How to compare the inference time?

Yxxxb / VoCo-LLaMA

VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

https://yxxxb.github.io/VoCo-LLaMA-page/

Apache License 2.0

84 stars 4 forks source link

How to compare the inference time? #8

Closed Gumpest closed 5 months ago

Gumpest commented 5 months ago

Hi, authors. I wonder how to present the efficiency via inference time.

Gumpest commented 5 months ago

Besides, I want to learn how to compute the CUDA time. Thanks a lot.

Yxxxb commented 5 months ago

Hi,

llama.cpp, LLaVA-cli or simple time function of Python can complete the measurement of time.

Gumpest commented 5 months ago

@Yxxxb What batch size do you use?

Yxxxb commented 5 months ago

The overall batch size is 128, which is the same as LLaVA SFT stage. You could check the training Hyperparameters in the "Additional Implement Details" section of our paper's appendix.