AI4Bharat / IndicTrans2

Translation models for 22 scheduled languages of India
https://ai4bharat.iitm.ac.in/indic-trans2
MIT License
214 stars 59 forks source link

Not able to install the fairseq in cuda 12 with the command source install.sh #47

Closed Sab8605 closed 6 months ago

Sab8605 commented 6 months ago

First let me tell you the steps I have did. 1 Clone the github in virtual machine. 2 then run source install.sh all dependencies are install except fairseq.

So I have install it with git checkout cf8ff8c3c5242e6e71e8feb40de45dd699f3cc08 this extra command. now I have versions torch 2.2.1+cu118 , fairseq - 1.0.0a0+cf8ff8c
torchaudio 2.2.1 cuda 12(Virtual machine) then downloaded the spm and dictionary .renamed it. downloaded BPCC data and taken wiki for train. download the model. then downloaded IN22 data and used for duplication by giving conv from IN22 as a bench mark. and comment "parallel --pipe --keep-order " this line from prepare_data_joint_finetuning.sh then run "bash prepare_data_joint_finetuning.sh " this command after creating exp folder and pasted all the files in it. eg. vocab, final dict, train and devtest (flores 22). Commands run well. Then run the final command for finetuning which is
"bash finetune.sh /home/translation-exp-vm/sab/exp transformer_base18L /home/translation-exp-vm/sab/data/jaygala/it2_ckpts/distilled_models/en-indic/fairseq_model/model/checkpoint_best.pt" then getting the error

self._bin_buffer_mmap._mmap.close() AttributeError: 'MMapIndexedDataset' object has no attribute '_bin_buffer_mmap'

this error is showing because the files creating in exp folder are most of empty. image.

please help me with it. is I need to change cuda version with 11.8 or 12.1 or 12.2. or I am doing anything wrong.

Now I have changed the cuda version and try the things but still getting the same error. Thank you for help.

Sab8605 commented 6 months ago

Solver the issue by

sudo apt-get install cmake build-essential pkg-config libgoogle-perftools-dev % git clone https://github.com/google/sentencepiece.git % cd sentencepiece % mkdir build % cd build % cmake .. % make -j $(nproc) % sudo make install % sudo ldconfig -v

for spm encode.