AI4Bharat / IndicXlit

Transliteration models for 21 Indic languages
https://ai4bharat.iitm.ac.in/transliteration
MIT License
67 stars 16 forks source link

Replicating IndicXlit in restricted network #16

Open gk1089 opened 1 year ago

gk1089 commented 1 year ago

Discussed in https://github.com/AI4Bharat/IndicXlit/discussions/15

Originally posted by **gk1089** November 30, 2022 Dear AI4Bharat team, Thank you for putting together this excellent project. The outcomes of the English to Hindi transliteration have been beyond expectations (as set by previously existing tools). I am trying to replicate this project in a local network with restricted internet access and need your help in understanding few things: 1. My current challenge is getting this library working for English to Hindi transliteration. I have installed all the dependencies and the python library using the `pip install ai4bharat-transliteration` command. But upon summoning the library using the below code through python: `from ai4bharat.transliteration import XlitEngine` `e = XlitEngine("hi", beam_width=10, rescore=True)` I am getting a wall of errors basically saying that the github link to download the indicxlit-en-indic-v1.0.zip file is not permitted in our network. I shall be able to download the required file separately and copy it on the restricted network, but I have no clue where to put it or what to do with it. Please guide me for the same. 2. I noticed that during installing the `XlitEngine` a lot of libraries that are related to Machine Learning are installed. I wonder if these are actually required for bare-minimum functioning of the library. For example, we plan to use just English-Hindi transliteration feature and are setting it up in a headless Linux Virtual Machine. As such, we won't need to train the models and any future updates in your project would be simply replicated. Therefore, my question: is there any way to set up a bare minimum portion of the library or will you be considering the same in future? Please let me know if we can contribute in any way by testing the system out. Best wishes! Gaurav
gk1089 commented 1 year ago

I got the URL whitelisted but still facing issues since I am behind a proxy.

` e = XlitEngine("hi", beam_width=10, rescore=True) /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " 2022-12-20 09:59:09.977829: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-12-20 09:59:10.398028: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-12-20 09:59:10.398114: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-12-20 09:59:12.458935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-20 09:59:12.459087: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-20 09:59:12.459193: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. /usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.8.0 and strictly below 2.11.0 (nightly versions are not supported). The versions of TensorFlow you are currently using is 2.11.0 and is not supported. Some things might work, some things might not. If you were to encounter a bug, do not file an issue. If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. You can find the compatibility matrix in TensorFlow Addon's readme: https://github.com/tensorflow/addons warnings.warn( Downloading Multilingual model for transliteration SSL certificate not verified... /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host '172.28.12.122'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings warnings.warn( Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 700, in urlopen self._prepare_proxy(conn) File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 996, in _prepare_proxy conn.connect() File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 369, in connect self._tunnel() File "/usr/lib/python3.8/http/client.py", line 901, in _tunnel (version, code, message) = response._read_status() File "/usr/lib/python3.8/http/client.py", line 277, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='objects.githubusercontent.com', port=443): Max retries exceeded with url: /github-production-release-asset-2e65be/487173539/4ef3b62d-385b-4a3a-9ab1-a3cc55764ef3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221220%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221220T101158Z&X-Amz-Expires=300&X-Amz-Signature=d24db49d92188df3dbf8a0f1a05126bdaae8bf42289befe734331a41b336f11c&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=487173539&response-content-disposition=attachment%3B%20filename%3Dindicxlit-en-indic-v1.0.zip&response-content-type=application%2Foctet-stream (Caused by ProxyError('Cannot connect to proxy.', timeout('timed out'))) `

Any idea if there is anyway to pass the proxy information to the library so that it may be used by urllib3?

Come on guys... any help will be much appreciated!