When running the incremental decoding cpp interface, a segmentation fault returned null in the compile_inference method of inference manager.
To reproduce the bug,
download the peft model from huggingface by running download_peft_model.py, example command:
python inference/utils/download_peft_model.py --base_model_name JackFram/llama-160m goliaro/llama-160m-lora-full --refresh-cache
#0 0x0000155553abd9d6 in FlexFlow::FFModel::compile_inference (this=0x0) at /home/ubuntu/FlexFlow/src/runtime/inference_manager.cc:611
#1 0x0000155553ab909c in FlexFlow::InferenceManager::compile_model_and_allocate_buffer (this=0x154fa020b250, model=0x154f941e3d60)
at /home/ubuntu/FlexFlow/src/runtime/inference_manager.cc:61
#2 0x0000155553c1a262 in FlexFlow::RequestManager::serve_incr_decoding (this=0x154f94204f10, llm=0x154f941e3d60) at /home/ubuntu/FlexFlow/src/runtime/request_manager.cc:2504
#3 0x0000155553c1a046 in FlexFlow::RequestManager::background_serving_task (task=0x154f94f05880, regions=..., ctx=0x154f982055f0, runtime=0x555556942710)
at /home/ubuntu/FlexFlow/src/runtime/request_manager.cc:2471
#4 0x0000155553ba5d88 in Legion::LegionTaskWrapper::legion_task_wrapper<&FlexFlow::RequestManager::background_serving_task> (args=0x154f94f06950, arglen=8, userdata=0x0,
userlen=0, p=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/legion.inl:21196
#5 0x000015554bc53cd0 in Realm::LocalTaskProcessor::execute_task (this=0x555556647e00, func_id=18, task_args=...)
at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/proc_impl.cc:1176
#6 0x000015554bcd033a in Realm::Task::execute_on_processor (this=0x154f94f067d0, p=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:326
#7 0x000015554bcd56b8 in Realm::UserThreadTaskScheduler::execute_task (this=0x5555568b30b0, task=0x154f94f067d0) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1687
#8 0x000015554bcd3451 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x5555568b30b0) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1160
#9 0x000015554bcdba90 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x5555568b30b0)
at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.inl:97
#10 0x000015554bcea4ce in Realm::UserThread::uthread_entry () at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.cc:1405
#11 0x000015554a6ef4e0 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()
When running the incremental decoding cpp interface, a segmentation fault returned null in the compile_inference method of inference manager.
To reproduce the bug,
download_peft_model.py
, example command:python inference/utils/download_peft_model.py --base_model_name JackFram/llama-160m goliaro/llama-160m-lora-full --refresh-cache
./inference/incr_decoding/incr_decoding -ll:gpu 1 -ll:cpu 4 -ll:fsize 8192 -ll:zsize 12000 -ll:util 4 -llm-model JackFram/llama-160m -prompt ../inference/prompt/peft.json -peft-model goliaro/llama-160m-lora-full --use-full-precision --inference-debugging --fusion -enable-peft
Error Backtrace from gdb: