tenzin3 commented 6 months ago

RFW0097: Improve the scaling of AI models API

Named Concepts

API (Application programming interface): is a set of rules and protocols that defines how two software systems can communicate with each other.

Machine Translation (Tibetan ↔ English): This is a type of language translation automation that uses AI to translate text or spoken words from Tibetan to English and vice versa.

Text to Speech (Tibetan): Text to Speech (TTS) technology converts written text in Tibetan into spoken words.TTS systems analyze the text, synthesize the Tibetan language phonetics, and produce a spoken rendition of the text in a natural-sounding Tibetan voice.

Speech to Text (Tibetan): Speech to Text technology converts spoken words in Tibetan into written text. This involves analyzing the speech audio and accurately transcribing it into text format.

OCR technology is used to convert different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data.

Summary

Monlam AI web application currently hosts four distinct AI models: i)Machine Translation (Tibetan<->English) ii)Text to Speech (Tibetan) iii)Speech to Text (Tibetan) iv)OCR(Optical Character Recognition)(Tibetan).

Currently, all the the above mentioned models works excluding OCR. The problem is long response time between the user inputs and the model output on the web page. One clear incidents happened during the launch day of Monlam when multiple users were testing the various models at the same time.

This RFW requires needs more discussion from both UI developer and model trainer suggestions.

Input

https://monlam.ai/

Expected Output

Faster response time for user requested data for AI models.

Expected Timeline

You need to mention the expected time line you want.

References

Include all the relevent references.

TenzinGayche commented 6 months ago

To enhance response time, there are two potential approaches: improving the efficiency of existing models or upgrading to more powerful GPUs. However, I believe that prioritizing the optimization of model efficiency is currently the most effective strategy.

kaldan007 commented 6 months ago

we need to calculate the current response time and our expected output should have a certain target response time.

OpenPecha / Requests