Closed lhpp1314 closed 1 month ago
Hi, @lhpp1314 thanks for using our simulator!
What version of onnxim are you using? Are you using the master branch version? I'll assume you're using master for now
languge mode
is not intended to simulate ONNX files!
This mode is an option for running LLMs that are not represented by ONNX files. This mode is being worked on in the LLM branch and has not yet been fully merged into master. S
If you run the command below without the language mode option, does the problem still occur?
./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list /home/liao/Desktop/LLM/ONNXim/models/language_models/gpt2_g_1.json
If you want to try llama-8b model, checkout llm branch and build it
git checkout llm && cd build && cmake .. && make -j
Then create a json file with the following content
{
"models": [
{
"name": "llama3-8b",
"trace_file" : "input.csv",
"scheduler" : "simple"
}
]
}
Then pass this json file to --model_lists
optiion
./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list created_json_file.json -mode language
Try this and let me know if it doesn't work!
P.S I'll merge the llm branch into master sooner or later
I use LLM branch and run the following cmd:
./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list /home/liao/Desktop/LLM/ONNXim/models/language_models/llama3-8b.json --mode language
and an error occurs:
terminate called after throwing an instance of 'std::runtime_error'
what(): Config key core_print_interval not found
Aborted (core dumped)
the following is my llama3-8b.json:
{
"models": [
{
"name": "llama3-8b",
"trace_file" : "input.csv",
"scheduler" : "simple"
}
]
}
GPT2 has an onnx file, do you mean I need to run this model without language mode(in default mode) in master branch? I tried to run gpt2 in default mode in master branch, and it worked but with the following warnings:
[2024-07-23 22:52:49.013] [warning] Node Proto optype "ReduceMean" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Add" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Sqrt" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Div" returned dummy operator!
[2024-07-23 22:52:49.013] [warning] Node Proto optype "Mul" returned dummy operator!
......
I think there might be some problems because it returned dummy operator
@lhpp1314
We added new configuration for stat in LLM branch use try this config file
{
"num_cores" : 4,
"core_type" : "systolic_ws",
"core_freq" : 1050,
"core_width" : 128,
"core_height" : 128,
"core_print_interval" : 10000,
"spad_size" : 65536,
"accum_spad_size" : 8192,
"sram_width" : 32,
"vector_process_bit" : 65536,
"add_latency" : 1,
"mul_latency" : 1,
"exp_latency" : 1,
"gelu_latency" : 1,
"add_tree_latency" : 1,
"scalar_sqrt_latency" : 1,
"scalar_add_latency" : 1,
"scalar_mul_latency" : 1,
"dram_type" : "simple",
"dram_freq" :1200,
"dram_channels": 16,
"dram_req_size": 32,
"dram_latency" : 10,
"dram_print_interval": 100000,
"dram_config_path" : "../configs/ramulator_configs/HBM-config.cfg",
"icnt_type" : "simple",
"icnt_latency" : 1,
"icnt_freq" : 8000,
"icnt_config_path" : "../configs/booksim2_configs/fly_c4_m16.icnt",
"precision" : 2,
"layout" : "NHWC",
"scheduler" : "simple"
}
In case of GPT2, Yes, use master branch with default mode. It seems like that you're onnx file was not optimized.
Try this onnx file link.
This optimized model can be generated with the script(scripts/generate_x_onnx.py
).
-EDITED- Modify the link to make it publicly accessible
I use new gpt2.onnx and it finally seems working. For llama, I put the following json file in models_list:
{
"models": [
{
"name": "llama3-8b",
"trace_file" : "input.csv",
"scheduler" : "simple"
}
]
}
and there is also a llama3-8b.json file in ONNXIM/models/language_models:
{
"activation_function" : "swish",
"num_attention_heads" : 32,
"num_kv_heads" : 8,
"vocab_size" : 128256,
"num_hidden_layers" : 32,
"hidden_size" : 4096,
"intermediate_size" : 14336,
"ffn_type" : "llama",
"max_seq_length" : 8192,
"run_single_layer": true
}
I also change the config file mentioned above and run cmd:
./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --models_list ./model_lists/llama3-8b.json --mode language
but there is still an error(already in llm branch and successfully build and make):
terminate called after throwing an instance of 'std::runtime_error'
what(): Config key mac_latency not found
Aborted (core dumped)
Oh sorry, I missed one configuration. Add this line in the configuration
"mac_latency" : 1,
Maybe also miss div_latency cause there is an error:
Config key div_latency not found
Also, there is an error when I use generate_transformer.py with optimization.But if I set only_onnxruntime=true, there is no error,do you have any clue for that? Thanks!
python3 ./scripts/generate_transformer_onnx.py --model gpt2
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if -seq_len + total_seq_len < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
Cannot determine if total_seq_len - 7 < 0
DONE
using language mode to run gpt2
an error occurs:
I use your generate_transformer.py to generate gpt2 model and the following is my gpt2_g_1. json file :
and by the way ,there is also an error when running llama/opt in language mode . cmd is as follows:
an error occurs:
Do you have any solution to these errors? Thanks!