I found ltu/src/ltu_as/inference_gradio.py line 60 converts all params into float32.
convert_params_to_float32(model)
Inferring with float32 is really slow and costly in GPU memory. Have you guys tested inference with float16? Does it have a negative impact on the performance?
I found
ltu/src/ltu_as/inference_gradio.py
line 60 converts all params into float32.convert_params_to_float32(model)
Inferring with float32 is really slow and costly in GPU memory. Have you guys tested inference with float16? Does it have a negative impact on the performance?