Recipe stuck when predicting

Kai-Piontek commented 2 months ago

Thank you Francois for fixing #48, I can now train the model successfully.

But when I want to run inference with eole predict --src wmt17_en_de/test.src.bpe --model_path wmt17_en_de/bigwmt17/step_50000 --beam_size 5 --batch_size 4096 --batch_type tokens --output wmt17_en_de/pred.trg.bpe --gpu 0

I get the error below. What might be wrong?

Best, Kai

[2024-09-08 18:23:26,170 INFO] Loading checkpoint from wmt17_en_de/bigwmt17/step_50000 [2024-09-08 18:23:26,893 INFO] Building model... [2024-09-08 18:23:26,894 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,894 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,894 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,894 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,894 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,894 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,901 WARNING] You have a CUDA device, should run with -gpu_ranks [2024-09-08 18:23:26,925 INFO] Loading data into the model [2024-09-08 18:23:27,136 INFO] Transforms applied: [] Traceback (most recent call last): File "/usr/local/bin/eole", line 33, in sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')()) File "/eole/eole/bin/main.py", line 39, in main bincls.run(args) File "/eole/eole/bin/run/predict.py", line 42, in run predict(config) File "/eole/eole/bin/run/predict.py", line 18, in predict , , = engine.infer_file() File "/eole/eole/inference_engine.py", line 37, in infer_file scores, estims, preds = self._predict(infer_iter) File "/eole/eole/inference_engine.py", line 163, in _predict scores, estims, preds = self.predictor._predict( File "/eole/eole/predict/inference.py", line 454, in _predict batch_data = self.predict_batch(batch, attn_debug) File "/eole/eole/predict/translator.py", line 121, in predict_batch return self._translate_batch_with_strategy(batch, decode_strategy) File "/eole/eole/predict/translator.py", line 194, in _translate_batch_with_strategy log_probs, attn = self._decode_and_generate( File "/eole/eole/predict/inference.py", line 664, in _decode_and_generate dec_out, dec_attn = self.model.decoder( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/eole/eole/decoders/transformer_decoder.py", line 200, in forward emb, attn, attn_align = layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/eole/eole/decoders/transformer_base.py", line 76, in forward layer_out, attns = self._forward(*args, *kwargs) File "/eole/eole/decoders/transformer_decoder.py", line 95, in _forward selfattn, = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/eole/eole/modules/multi_headed_attn.py", line 690, in forward return super()._forward2( File "/eole/eole/modules/multi_headed_attn.py", line 461, in _forward2 attn_output = self.flash_attn_func( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 831, in flash_attn_func return FlashAttnFunc.apply( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply return super().apply(args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 511, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_forward( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 51, in _flash_attn_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd( RuntimeError: FlashAttention only support fp16 and bf16 data type

vince62s commented 2 months ago

did you try to add --compute_dtype fp16

Kai-Piontek commented 2 months ago

I just tried:

eole: error: unrecognized arguments: --compute_dtype fp16

Full command: eole predict --src wmt17_en_de/test.src.bpe --model_path wmt17_en_de/bigwmt17/step_50000 --beam_size 5 --batch_size 4096 --batch_type tokens --output wmt17_en_de/pred.trg.bpe --gpu 0 --compute_dtype fp16

vince62s commented 2 months ago

taking an example like this one: https://github.com/eole-nlp/eole/blob/main/recipes/llama2/llama-inference.yaml can you try to infer using a yaml file ?

Kai-Piontek commented 2 months ago

Dear Vincent

Thank you for pointing me to the yaml file. I got it to work using: --precision fp16 -gpu_ranks 0

eole predict --src wmt17_en_de/test.src.bpe --model_path wmt17_en_de/bigwmt17/step_50000 --beam_size 5 --batch_size 4096 --batch_type tokens --output wmt17_en_de/pred.trg.bpe --gpu 0 --precision fp16 -gpu_ranks 0

Maybe it would be good to add it to the recipe, in case I am not the only one encountering it.

Love this project btw and I am happy that it is working now :)

vince62s commented 2 months ago

oh it means you are not on the last main branch

see this: https://github.com/eole-nlp/eole/pull/54

Kai-Piontek commented 2 months ago

Dear Vince

Yes you were right. I pulled the docker image and it was not on the latest branch. I tried to repull the image but it was still not on the latest branch, so I logged into the docker container bash and cloned the eole repo again. Now everything works fine.

Best, Kai

francoishernandez commented 2 months ago

Indeed, the docker image had not been updated in quite a while. @Kai-Piontek you can check out the 0.0.2 images that were just built if needed!

eole-nlp / eole

Recipe stuck when predicting #99