Open damodharanj opened 10 months ago
Hi @damodharanj
Yes, this should improve the inference time. However, it would require you to write the model definitions in C++ similar to llama.cpp and convert it to ggml.
Currently, we don't have the bandwidth, experience and hardware resources to help you port the models to ggml. Please let us know if there is any progress on this thread.
thanks a lot for the response! Sure will update what I can do from my end once I take this up
The model is pretty amazing and thanks a lot for open sourcing it. Is there a way to size it down and run in hardwares like Apple silicon using ggml ? GGML Would this improve the inference times ? For me in Apple M2 it takes 12 seconds to translate 1 sentence. If you can guide me to do this would be willing to help!