AI4Bharat / IndicTrans2

Translation models for 22 scheduled languages of India
https://ai4bharat.iitm.ac.in/indic-trans2
MIT License
214 stars 59 forks source link

Instruction for Use on the ParamShivay and other SLURM systems #94

Closed singhakr closed 4 days ago

singhakr commented 3 weeks ago

Could you please add instructions for installation and usage on the ParamShivay system, which many of the HEIs in India are using for computation? Or, could you just confirm that it will work simply by creating a virtual environment and installing from the script provided? Providing this information might save quite a bit of time, which might otherwise be wasted in trying to use it in a wrong way.

Thanks for the great work! I ask most of my students to use AI4Bharat models, at least to start with.

singhakr commented 3 weeks ago

I tried to install using the instructions given, but I get the following error on the Param Shivay system:

`Checking out files: 100% (1619/1619), done. Processing /home/user.name/IndicTrans2/fairseq Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [23 lines of output] Traceback (most recent call last): File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 327, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=[]) File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 297, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 313, in run_setup exec(code, locals()) File "", line 12, in File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/init.py", line 289, in _load_global_deps() File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/init.py", line 245, in _load_global_deps raise err File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/init.py", line 226, in _load_global_deps ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL) File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/ctypes/init.py", line 382, in init self._handle = _dlopen(self._name, mode) OSError: /tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/lib/libtorch_global_deps.so: failed to map segment from shared object: Operation not permitted [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1`

PranjalChitale commented 3 weeks ago

The install.sh script should work regardless of the cluster you intend to run your experiments on.

Regarding the traceback shared, following issues might be relevant to help debug. https://github.com/pytorch/pytorch/issues/16558 https://github.com/AI4Bharat/IndicTrans2/issues/85

singhakr commented 3 weeks ago

But I am getting this error while running the install.sh script on Param Shivay, not while doing translation or importing PyTorch.

The installation does not finish correctly and so I am not able to try translation.

The error occurs at the following point during installation:

Processing /home/user.name/IndicTrans2/fairseq

singhakr commented 3 weeks ago

Since the link you gave mentions lack of memory as the problem, should I retry installation from a computer node and as a job, rather than the usual kind of installation?

singhakr commented 3 weeks ago

I have managed to do it with the HuggingFace interface. I was probably doing something simple in a wrong way.