Different results when run with tensor parallelism

subhalingamd commented 1 month ago

Hi,

I was running Flan-t5 XXL with ctranslate2 and observed completely different results when run with tensor parallelism.

To convert from HF to CT2:

ct2-transformers-converter --model google/flan-t5-xxl --output_dir flan-t5-xxl --quantization bfloat16

Code:

import ctranslate2
import transformers

translator = ctranslate2.Translator("flan-t5-xxl", device="cuda", tensor_parallel=True)
tokenizer = transformers.AutoTokenizer.from_pretrained("google/flan-t5-xxl")

input_text = "Who is president of united states?"

input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

results = translator.translate_batch([input_tokens], return_scores=True)

if ctranslate2.MpiInfo.getCurRank() == 0:
    output_tokens = results[0].hypotheses[0]
    output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
    print("Output tokens: ", output_tokens)
    print("Output text: ", output_text)
    print("Score: ", results[0].scores[0])

Outputs: Case 1: No TP When run as python run.py or mpirun -n 1 python run.py

Output tokens:  ['▁Barack', '▁Obama']
Output text:  Barack Obama
Score:  -0.609375

Case 2: With TP When run as mpirun -n 2 python run.py

Output tokens:  ['▁', 'john', '▁', 'f', '▁', 'kenn', 'e', 'd', 'y']
Output text:  john f kennedy
Score:  -0.5375000238418579

I hope this is not an expected behaviour.

Further, with v4.3.0, I get an extra error at the end (after the output) which I didn't use to get with v4.1.0 (with the same code). The error goes like this:

[servername:18378] *** Process received signal ***
[servername:18378] Signal: Aborted (6)
[servername:18378] Signal code:  (-6)
[servername:18378] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fcc96109980]
[servername:18378] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fcc9559fe87]
[servername:18378] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fcc955a17f1]
[servername:18378] [ 3] /home/subha/miniconda3/envs/inf2/bin/../lib/libstdc++.so.6(+0xb135a)[0x7fcc839f435a]
[servername:18378] [ 4] /home/subha/miniconda3/envs/inf2/bin/../lib/libstdc++.so.6(+0xb13c5)[0x7fcc839f43c5]
[servername:18378] [ 5] /home/subha/miniconda3/envs/inf2/bin/../lib/libstdc++.so.6(+0xb1658)[0x7fcc839f4658]
[servername:18378] [ 6] /home/subha/miniconda3/envs/inf2/lib/python3.10/site-packages/ctranslate2/../ctranslate2.libs/libctranslate2-acb10d87.so.4.3.0(+0x25fb20)[0x7fcc83db6b20]
[servername:18378] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0x43031)[0x7fcc955a4031]
[servername:18378] [ 8] /lib/x86_64-linux-gnu/libc.so.6(+0x4312a)[0x7fcc955a412a]
[servername:18378] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xee)[0x7fcc95582c8e]
[servername:18378] [10] python[0x58852e]
[servername:18378] *** End of error message ***
Aborted (core dumped)

Your help would be greatly appreciated.

minhthuc2502 commented 1 month ago

You're right, with flan-t5, there is position_bias where I never handle this before in tensor parallel.

Not sure why there is an error when stopping the program in v4.3.0 but I'll do a fix in the next version.

subhalingamd commented 1 month ago

Hi @minhthuc2502, thanks for the response.

You're right, with flan-t5, there is position_bias where I never handle this before in tensor parallel.

could you please share if there are any plans to fix this?

OpenNMT / CTranslate2

Different results when run with tensor parallelism #1708