FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Unable to run the benchmark #33

Closed fungiboletus closed 1 year ago

fungiboletus commented 1 year ago

Hi,

I'm trying to run the benchmark bench_30b_1x4.sh (except that I set N_GPUS=2), but I get the following python exception:

rank #1: TypeError: sequence item 6: expected str instance, NoneType found
Traceback (most recent call last):
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/fungiboletus/flexgen/flexgen/dist_flex_opt.py", line 694, in <module>
    raise e
  File "/home/fungiboletus/flexgen/flexgen/dist_flex_opt.py", line 690, in <module>
    run_flexgen_dist(args)
  File "/home/fungiboletus/flexgen/flexgen/dist_flex_opt.py", line 620, in run_flexgen_dist
    outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3432, in batch_decode
    return [
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3433, in <listcomp>
    self.decode(
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3471, in decode
    return self._decode(
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 949, in _decode
    sub_texts.append(self.convert_tokens_to_string(current_sub_text))
  File "/home/fungiboletus/miniconda3/envs/flexgen/lib/python3.10/site-packages/transformers/models/gpt2/tokenization_gpt2.py", line 316, in convert_tokens_to_string
    text = "".join(tokens)
TypeError: sequence item 6: expected str instance, NoneType found
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. ...

I use Python 3.10.9 with Pytorch 1.13.1 with Cuda 11.7, and mpirun 2.1.1.

zhuohan123 commented 1 year ago

I have reproduced your error. It seems like with dummy weight, the model will generate very random output_ids like the following:

In [25]: output_ids[3:4, 500:]
Out[25]:
array([[                1,                 1,                 1,
                        1,                 1,                 2,
                    32826,                16,                 5,
                      812,               343,                 9,
        72340172838076672,                 0,                 0,                                                                                                                                                                                         0,                 0,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1,                 1,
                        1,                 1]])

A hot fix is included in #50.

fungiboletus commented 1 year ago

Thanks! I ran the benchmark and got 12.71 tokens using two Tesla A30.