Closed kunci115 closed 3 weeks ago
PR Welcome
Please compile the model, or try the quantized version.
@PoTaTo-Mika what do you mean by compile the model ? also how to do quantized version? since I only do steps for inference in english documentation https://speech.fish.audio/en/inference/#2-create-a-directory-structure-similar-to-the-following-within-the-ref_data-folder
there's a python file called quantize.py, you can view the file and choose to quantize.
there's a python file called quantize.py, you can view the file and choose to quantize.
its creating me a folder quantized version of the model now, just run it like previous run with that checkpoints model? still got the same latency
This issue is stale because it has been open for 30 days with no activity.
streaming in 4090 tooks more than 2 second depend on length of token, is there a way to yield it/return while the engine still generating?