help in finetuning ai4bharat/indictrans2-indic-en-1B

manojbalaji1 commented 4 months ago

model: ai4bharat/indictrans2-indic-en-1B

We tried finetuning the model and we are getting the following error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

After setting CUDA_LAUNCH_BLOCKING=1, we are getting the following: ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [505,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [505,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

Any help appreciated.

P.S. We can do fairseq based finetuning but we are constrained by the fact that most of our data utility functions are already written for huggingface model. So thought of giving one final chance to see if we can try to figure out something, before we start putting efforts on moving to fairseq based model. Thanks in advance

PranjalChitale commented 4 months ago

Did you setup the environment correctly using install.sh.

Are you able to perform inference using the HF models ?

If the correct version of all the dependencies is installed, train_lora.sh should ideally work for you.

PranjalChitale commented 4 months ago

Closing due to inactivity.

AI4Bharat / IndicTrans2

help in finetuning ai4bharat/indictrans2-indic-en-1B #81