The size of this model is around 4 GB, and due to size constraints, we found a workaround on downloading the model locally using code, which was much more efficient in loading time and integrating with our RAG implementation.
Changed the model as this quantized model didn't give the best results, went for a bigger model which took longer to respond but gave more accurate answers.
The size of this model is around 4 GB, and due to size constraints, we found a workaround on downloading the model locally using code, which was much more efficient in loading time and integrating with our RAG implementation.