It depends on many things such as the sequence length of the input/output, the hardware, batching, etct, To give you some data points, for me on a v3-8 TPU assuming the inference function has already been compiled:
text-to-Image generation in the demo with classifier-free guidance takes 11 seconds for a single image, and 36 seconds to generate 8 samples
text-to-text generation in the demo with a short input prompt and up to 512 output tokens takes 3 seconds for one samples and 4 seconds for 8 outputs
Running in batch mode instead of the demo, a VQA example or image generation example takes about 1.5 seconds.
hi,how long does it take for a model inference to take?