flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.59k stars 218 forks source link

FlexFlow Mapper Assertion Failure: Memory Allocation Error #1362

Closed hygorjardim closed 2 months ago

hygorjardim commented 2 months ago

I was trying to run the following inference/python/spec_infer.py script with the following configuration settings for my environment

        ff_init_configs = {
            # required parameters
            "num_gpus": 1,
            "memory_per_gpu": 14000,
            "zero_copy_memory_per_node": 45000,
            # optional parameters
            "num_cpus": 4,
            "legion_utility_processors": 4,
            "data_parallelism_degree": 1,
            "tensor_parallelism_degree": 1,
            "pipeline_parallelism_degree": 1,
            "offload": False,
            "offload_reserve_space_size": 1024**2,
            "use_4bit_quantization": False,
            "use_8bit_quantization": False,
            "profiling": False,
            "inference_debugging": False,
            "fusion": True,
        }

Also using the script's standard LLM and SSM

        llm_configs = {
            # required llm arguments
            "llm_model": "meta-llama/Llama-2-7b-hf",
            # optional llm parameters
            "cache_path": "",
            "refresh_cache": False,
            "full_precision": False,
            "ssms": [
                {
                    # required ssm parameter
                    "ssm_model": "JackFram/llama-160m",
                    # optional ssm parameters
                    "cache_path": "",
                    "refresh_cache": False,
                    "full_precision": False,
                }
            ],

And I got the following error when running

Loading weight file output_weight
sub request num == 1, 1 
SSM KV Cache Size init: 11
LLM KV Cache Size init: 0
load 11 tokens for request 1000000
total prompt in request: 11
LLM KV Cache Size init: 0
load 11 tokens for request 1000000
total prompt in request: 11
max_prompt_load_size: 116
[0 - 7bfa20635740]  107.900677 {5}{Mapper}: FlexFlow failed allocation of size 90177536 bytes for region requirement 108 of task unnamed_task_133 (UID 2744) in memory 1e00000000000002 with kind 6 for processor 1d00000000000008.
python: /tmp/pip-install-0l5wuoth/flexflow_aeb6e298b433425f9696d30293a0a90d/src/mapper/mapper.cc:648: virtual void FlexFlow::FFMapper::map_task(Legion::Mapping::MapperContext, const Legion::Task&, const Legion::Mapping::Mapper::MapTaskInput&, Legion::Mapping::Mapper::MapTaskOutput&): Assertion `false' failed.
Aborted (core dumped)

spec_infer_execution_log.log

Has anyone come across this problem? Could you suggest alternatives to get around it? Thank you immensely for your attention.

hygorjardim commented 2 months ago

This issue was resolved when I built with the v24.1.0 tag