DataGov-SamagraX / Hinglish

1 stars 1 forks source link

Deploying Hinglish transformers via API calls #1

Open GautamR-Samagra opened 1 year ago

GautamR-Samagra commented 1 year ago

Current state- We have a setup such that one can ingest through POST sentences in Hindi/English/Hinglish and it returns the sentence translated to English and the associated sentiment

Requirements-

Aim : We need to decide a strategy for estimating the architecture required to deploy the Bhashini (and other translation) models at scale.

tushar5526 commented 1 year ago

Hey @GautamR-Samagra the requirements.txt file is not properly formatted. Can you use pip freeze > requirements.txt command and push the result?

tushar5526 commented 1 year ago

Please share the associated curl request or POSTMAN for me to understand the GET and POST requests here. Some sample data will also help me to mock user calls.

tushar5526 commented 1 year ago

Also, what are your estimates on number of requests that are to be made. I am thinking of 100k requests in total by 100 concurrent users.

pSN0W commented 1 year ago

You can make these curl requests

curl -H 'Content-Type: application/json' \
      -d '{"sent":"tum kaise ho. I am fine"}' \
      -X POST \
      http://localhost:5000/translate
curl -H 'Content-Type: application/json' \
      -d '{"sent":"I am fine"}' \
      -X POST \
      http://localhost:5000/sentiment
tushar5526 commented 1 year ago

I would prefer if you can also send more verbose and different sample data to test your model for many requests. You can paste the "sent" below for /translate and /sentiment separately.

pSN0W commented 1 year ago

Hey @tushar5526 there are no limits to what you can pass to sent neither is there any particular output that you should expect for a particular sentence so feel free to play around yourself. Just follow these :

  1. /translate can expect any sent as input (english, hinglish or devnagiri )
  2. /sentiment only takes english sentence as input.
tushar5526 commented 1 year ago

/translate can expect any sent as input (english, hinglish or devnagiri ) send me a few of them so that I don't have to generate them manually at my end. You can also link me the dataset your are using. I will pick few of them from there.

tushar5526 commented 1 year ago

@pSN0W I am getting conflicts while installing dependencies.

The conflict is caused by:
    The user requested importlib-metadata==3.10.1
    flask 2.2.2 depends on importlib-metadata>=3.6.0; python_version < "3.10"
    konoha 4.6.5 depends on importlib-metadata<4.0.0 and >=3.7.0
    sphinx 5.3.0 depends on importlib-metadata>=4.8; python_version < "3.10"

Which python version are you using ?

tushar5526 commented 1 year ago

Also, it seems you have shared the requirements file of your local python env instead of the virtual env (beautiful soup is listed, but I can't see it being used anywhere and there are missing dependencies -- https://github.com/DataGov-SamagraX/Hinglish/blob/main/requirments.txt#L5).

Traceback (most recent call last):
  File "/workspace/.pyenv_mirror/user/current/lib/python3.8/site-packages/flask/cli.py", line 218, in locate_app
    __import__(module_name)
  File "/workspace/Hinglish/app.py", line 2, in <module>
    from utils import final_transliteration,flair_prediction
  File "/workspace/Hinglish/utils.py", line 4, in <module>
    from indicTrans.inference.engine import Model
ModuleNotFoundError: No module named 'indicTrans'

Can you reshare the requirements.txt file after enabling the virtual env.

pSN0W commented 1 year ago

Hey its of the local only. Most of the packages that need to be installed depend on other packages like BeautifulSoup that's why its installed. The requirments.txt is getting installed on my system without any issue. I am using linux

pSN0W commented 1 year ago

You need to git clone for indicTrans package as it's not a python library

tushar5526 commented 1 year ago

@GautamR-Samagra I think we don't need to test this after we have tested GPT3. We can close this.

GautamR-Samagra commented 1 year ago

@tushar5526 this one has 2 Bhashini models- hindi <-> english , a transliteration model and a sentiment analysis model on top of it. It will take even more time. We can do it next week, but let's try and and test and get the exact numbers and also keep in a format such that we can deploy quickly on a server and take a video of pushing an API call and getting the Hinglish text / sentiment back from it.