flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.59k stars 218 forks source link

FFModel 'null' in compile_inference #1332

Open april-yyt opened 3 months ago

april-yyt commented 3 months ago

When running the incremental decoding cpp interface, a segmentation fault returned null in the compile_inference method of inference manager.

To reproduce the bug,

  1. download the peft model from huggingface by running download_peft_model.py, example command: python inference/utils/download_peft_model.py --base_model_name JackFram/llama-160m goliaro/llama-160m-lora-full --refresh-cache
  2. run incremental decoding, example command: ./inference/incr_decoding/incr_decoding -ll:gpu 1 -ll:cpu 4 -ll:fsize 8192 -ll:zsize 12000 -ll:util 4 -llm-model JackFram/llama-160m -prompt ../inference/prompt/peft.json -peft-model goliaro/llama-160m-lora-full --use-full-precision --inference-debugging --fusion -enable-peft

Error Backtrace from gdb:

#0  0x0000155553abd9d6 in FlexFlow::FFModel::compile_inference (this=0x0) at /home/ubuntu/FlexFlow/src/runtime/inference_manager.cc:611
#1  0x0000155553ab909c in FlexFlow::InferenceManager::compile_model_and_allocate_buffer (this=0x154fa020b250, model=0x154f941e3d60)
    at /home/ubuntu/FlexFlow/src/runtime/inference_manager.cc:61
#2  0x0000155553c1a262 in FlexFlow::RequestManager::serve_incr_decoding (this=0x154f94204f10, llm=0x154f941e3d60) at /home/ubuntu/FlexFlow/src/runtime/request_manager.cc:2504
#3  0x0000155553c1a046 in FlexFlow::RequestManager::background_serving_task (task=0x154f94f05880, regions=..., ctx=0x154f982055f0, runtime=0x555556942710)
    at /home/ubuntu/FlexFlow/src/runtime/request_manager.cc:2471
#4  0x0000155553ba5d88 in Legion::LegionTaskWrapper::legion_task_wrapper<&FlexFlow::RequestManager::background_serving_task> (args=0x154f94f06950, arglen=8, userdata=0x0, 
    userlen=0, p=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/legion.inl:21196
#5  0x000015554bc53cd0 in Realm::LocalTaskProcessor::execute_task (this=0x555556647e00, func_id=18, task_args=...)
    at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/proc_impl.cc:1176
#6  0x000015554bcd033a in Realm::Task::execute_on_processor (this=0x154f94f067d0, p=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:326
#7  0x000015554bcd56b8 in Realm::UserThreadTaskScheduler::execute_task (this=0x5555568b30b0, task=0x154f94f067d0) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1687
#8  0x000015554bcd3451 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x5555568b30b0) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1160
#9  0x000015554bcdba90 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x5555568b30b0)
    at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.inl:97
#10 0x000015554bcea4ce in Realm::UserThread::uthread_entry () at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.cc:1405
#11 0x000015554a6ef4e0 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()