Open tenzin3 opened 6 months ago
To enhance response time, there are two potential approaches: improving the efficiency of existing models or upgrading to more powerful GPUs. However, I believe that prioritizing the optimization of model efficiency is currently the most effective strategy.
we need to calculate the current response time and our expected output should have a certain target response time.
RFW0097: Improve the scaling of AI models API
Named Concepts
API (Application programming interface): is a set of rules and protocols that defines how two software systems can communicate with each other.
Machine Translation (Tibetan ↔ English): This is a type of language translation automation that uses AI to translate text or spoken words from Tibetan to English and vice versa.
Text to Speech (Tibetan): Text to Speech (TTS) technology converts written text in Tibetan into spoken words.TTS systems analyze the text, synthesize the Tibetan language phonetics, and produce a spoken rendition of the text in a natural-sounding Tibetan voice.
Speech to Text (Tibetan): Speech to Text technology converts spoken words in Tibetan into written text. This involves analyzing the speech audio and accurately transcribing it into text format.
OCR technology is used to convert different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data.
Summary
Monlam AI web application currently hosts four distinct AI models: i)Machine Translation (Tibetan<->English) ii)Text to Speech (Tibetan) iii)Speech to Text (Tibetan) iv)OCR(Optical Character Recognition)(Tibetan).
Currently, all the the above mentioned models works excluding OCR. The problem is long response time between the user inputs and the model output on the web page. One clear incidents happened during the launch day of Monlam when multiple users were testing the various models at the same time.
This RFW requires needs more discussion from both UI developer and model trainer suggestions.
Input
https://monlam.ai/
Expected Output
Faster response time for user requested data for AI models.
Expected Timeline
You need to mention the expected time line you want.
References
Include all the relevent references.