PSAL-POSTECH / ONNXim

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
MIT License
54 stars 11 forks source link

error when running this simulator in language mode #7

Closed lhpp1314 closed 1 month ago

lhpp1314 commented 2 months ago

using language mode to run gpt2

./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list /home/liao/Desktop/LLM/ONNXim/models/language_models/gpt2_g_1.json --mode language

an error occurs:

2024-07-23 05:35:09.720] [info] CPU 0: Partition 0
[2024-07-23 05:35:09.720] [info] CPU 1: Partition 0
[2024-07-23 05:35:09.720] [info] CPU 2: Partition 0
[2024-07-23 05:35:09.720] [info] CPU 3: Partition 0
[2024-07-23 05:35:09.720] [info] Running in language mode
[2024-07-23 05:35:09.720] [info] Ramulator config: ./configs/../configs/ramulator_configs/HBM-config.cfg
[2024-07-23 05:35:09.724] [info] Initialize SimpleInterconnect
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_2::detail::type_error'
  what():  [json.exception.type_error.302] type must be number, but is null
Aborted (core dumped)

I use your generate_transformer.py to generate gpt2 model and the following is my gpt2_g_1. json file :

{
    "models": [
        {
            "name": "gpt2_g_1",
            "batch_size": 1,
            "nr_atten": -1,
            "sequence_length": 1,
            "seq_len": 1,
            "past_seq_len": 1024,
            "total_seq_len": 1025,
            "output_seq_len": 1125,
            "request_time": 0
        }
    ]
}

and by the way ,there is also an error when running llama/opt in language mode . cmd is as follows:

./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list /home/liao/Desktop/LLM/ONNXim/models/language_models/llama3-8b.json --mode language

an error occurs:

[2024-07-23 05:46:10.411] [info] CPU 0: Partition 0
[2024-07-23 05:46:10.411] [info] CPU 1: Partition 0
[2024-07-23 05:46:10.411] [info] CPU 2: Partition 0
[2024-07-23 05:46:10.411] [info] CPU 3: Partition 0
[2024-07-23 05:46:10.411] [info] Running in language mode
[2024-07-23 05:46:10.413] [info] Ramulator config: ./configs/../configs/ramulator_configs/HBM-config.cfg
[2024-07-23 05:46:10.416] [info] Initialize SimpleInterconnect
[2024-07-23 05:46:10.416] [info] ======Start Simulation=====
Segmentation fault (core dumped)

Do you have any solution to these errors? Thanks!

YWHyuk commented 2 months ago

Hi, @lhpp1314 thanks for using our simulator!

What version of onnxim are you using? Are you using the master branch version? I'll assume you're using master for now

languge mode is not intended to simulate ONNX files!

This mode is an option for running LLMs that are not represented by ONNX files. This mode is being worked on in the LLM branch and has not yet been fully merged into master. S

If you run the command below without the language mode option, does the problem still occur?

./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list /home/liao/Desktop/LLM/ONNXim/models/language_models/gpt2_g_1.json

If you want to try llama-8b model, checkout llm branch and build it

git checkout llm && cd build && cmake ..  && make -j

Then create a json file with the following content

{
  "models": [
     {
         "name": "llama3-8b",
         "trace_file" : "input.csv",
         "scheduler" : "simple"
     }
  ]
 }

Then pass this json file to --model_lists optiion

./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list created_json_file.json -mode language

Try this and let me know if it doesn't work!

P.S I'll merge the llm branch into master sooner or later

lhpp1314 commented 2 months ago

I use LLM branch and run the following cmd:

./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list /home/liao/Desktop/LLM/ONNXim/models/language_models/llama3-8b.json --mode language

and an error occurs:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Config key core_print_interval not found
Aborted (core dumped)

the following is my llama3-8b.json:

{
  "models": [
     {
         "name": "llama3-8b",
         "trace_file" : "input.csv",
         "scheduler" : "simple"
     }
  ]
 }

GPT2 has an onnx file, do you mean I need to run this model without language mode(in default mode) in master branch? I tried to run gpt2 in default mode in master branch, and it worked but with the following warnings:

[2024-07-23 22:52:49.013] [warning] Node Proto optype "ReduceMean" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Add" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Sqrt" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Div" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Mul" returned dummy operator!
......

I think there might be some problems because it returned dummy operator

YWHyuk commented 2 months ago

@lhpp1314

We added new configuration for stat in LLM branch use try this config file

{
  "num_cores" : 4,  
  "core_type" : "systolic_ws",
  "core_freq" : 1050,
  "core_width" : 128,
  "core_height" : 128,
  "core_print_interval" : 10000,

  "spad_size" : 65536,
  "accum_spad_size" : 8192,
  "sram_width" : 32,

  "vector_process_bit" : 65536,
  "add_latency" : 1,
  "mul_latency" : 1,
  "exp_latency" : 1,
  "gelu_latency" : 1,
  "add_tree_latency" : 1,
  "scalar_sqrt_latency" : 1,
  "scalar_add_latency" : 1,
  "scalar_mul_latency" : 1,

  "dram_type" : "simple",
  "dram_freq" :1200,
  "dram_channels": 16,
  "dram_req_size": 32,
  "dram_latency" : 10,
  "dram_print_interval": 100000,
  "dram_config_path" : "../configs/ramulator_configs/HBM-config.cfg",

  "icnt_type" : "simple",
  "icnt_latency" : 1,
  "icnt_freq" : 8000,
  "icnt_config_path" : "../configs/booksim2_configs/fly_c4_m16.icnt",

  "precision" : 2,
  "layout" : "NHWC",
  "scheduler" : "simple"
}

In case of GPT2, Yes, use master branch with default mode. It seems like that you're onnx file was not optimized.

Try this onnx file link.

This optimized model can be generated with the script(scripts/generate_x_onnx.py).

-EDITED- Modify the link to make it publicly accessible

lhpp1314 commented 2 months ago

I use new gpt2.onnx and it finally seems working. For llama, I put the following json file in models_list:

{
  "models": [
     {
         "name": "llama3-8b",
         "trace_file" : "input.csv",
         "scheduler" : "simple"
     }
  ]
 }

and there is also a llama3-8b.json file in ONNXIM/models/language_models:

{
 "activation_function" : "swish",
  "num_attention_heads" : 32,
  "num_kv_heads" : 8,
  "vocab_size" : 128256,
  "num_hidden_layers" : 32,
  "hidden_size" : 4096,
  "intermediate_size" : 14336,
  "ffn_type" : "llama",
  "max_seq_length" : 8192,
  "run_single_layer": true
 }

I also change the config file mentioned above and run cmd:

./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list ./model_lists/llama3-8b.json --mode language

but there is still an error(already in llm branch and successfully build and make):

terminate called after throwing an instance of 'std::runtime_error'
  what():  Config key mac_latency not found
Aborted (core dumped)
YWHyuk commented 2 months ago

Oh sorry, I missed one configuration. Add this line in the configuration "mac_latency" : 1,

lhpp1314 commented 2 months ago

Maybe also miss div_latency cause there is an error:

Config key div_latency not found
lhpp1314 commented 2 months ago

Also, there is an error when I use generate_transformer.py with optimization.But if I set only_onnxruntime=true, there is no error,do you have any clue for that? Thanks!

python3 ./scripts/generate_transformer_onnx.py --model gpt2
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
DONE