[Issue]: runTracer.sh trace aborted (Failed)

Problem Description

I install rDP and do tracing example follow the README.md. But it run Aborted(failed)

root@tw024:/ws/Try_rPD# runTracer.sh python matmult_gpu.py Creating empty rpd: trace.rpd rpd_tracer, because Shape of input data matrix: [1000, 500], weight matrix: [500, 500], result matrix:torch.Size([1000, 500]) tensor([[ 31.2559, -5.9614, -7.5495, ..., 4.9965, 13.3129, -22.1125], [ -8.3562, -23.1422, -7.1189, ..., -30.3476, 8.9711, -43.7970], [ 8.6492, 2.2358, -10.6567, ..., 21.0161, -46.0028, -26.3684], ..., [-18.7425, -26.5550, -22.3633, ..., 21.0699, 33.3842, -24.6637], [-37.9485, 16.3621, -19.1744, ..., -0.9327, 1.9820, -13.6000], [ 11.3354, 22.0743, 20.7730, ..., -0.6945, -12.1807, -11.0098]], device='cuda:0') rocpd_op: 0 rocpd_api_ops: 0 rocpd_kernelapi: 0 rocpd_copyapi: 0 rocpd_api: 0 rocpd_string: 0 rpd_tracer: finalized in 6.323764 ms double free or corruption (!prev) /usr/local/bin/runTracer.sh: line 42: 20 Aborted LD_PRELOAD=librpd_tracer.so "$@" root@tw024:/ws/Try_rPD# cat matmult_gpu.py import argparse import torch

def matmult_gpu(input_data, weights): """ Perform matrix multiplication of two tensors on GPU.

Args:
input_data (torch.Tensor): Input tensor.
weights (torch.Tensor): Weight tensor.

Returns:
torch.Tensor: Result of matrix multiplication.
"""
# Creating tensors on GPU
input_data = input_data.to('cuda')
weights = weights.to('cuda')

# Optimized matrix multiplication using torch.matmul
output = torch.matmul(input_data, weights)

return output

if name == "main": parser = argparse.ArgumentParser(description='Perform matrix multiplication of two tensors.') parser.add_argument('--x_shape', nargs=2, type=int, default=[1000, 500], metavar=('N', 'M'), help='Shape of input data matrix') parser.add_argument('--w_shape', nargs=2, type=int, default=[500, 500], metavar=('J', 'K'), help='Shape of weight matrix') args = parser.parse_args()

input_data = torch.randn(*args.x_shape)
weights = torch.randn(*args.w_shape)

output = matmult_gpu(input_data, weights)
print(f'Shape of input data matrix: {args.x_shape}, weight matrix: {args.w_shape}, result matrix:{output.shape}')
print(output)

Operating System

ubuntu22.04 within docker image rocm/vllm-dev:20241025-tuned

CPU

AMD EPYC 9654 96-Core Processor

GPU

AMD MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

No response

Steps to Reproduce

Start the container from the image rocm/vllm-dev:20241025-tuned
Login the container
Install rocmProfileData in the container.
Run the trace example Profiling a PyTorch multiplication function refer to https://github.com/ROCm/rocmProfileData/blob/master/examples/rocm-profile-data/README.md
Aborted with log "/usr/local/bin/runTracer.sh: line 42: 20 Aborted LD_PRELOAD=librpd_tracer.so "$@""

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

I also do the runTracer with vllm benchmark got same Aborted issue. Trace CMD:

runTracer.sh python /app/vllm/benchmarks/benchmark_latency.py --model /data/llm/Meta-Llama-3.1-405B --dtype float16 --gpu-memory-utilization 0.99 --distributed-executor-backend mp --tensor-parallel-size 8 --batch-size 32 --input-len 128 --output-len 128

Err log

WARNING 11-01 02:57:14 config.py:1711] Casting torch.bfloat16 to torch.float16.
ERROR 11-01 02:57:22 registry.py:270] Error in inspecting model architecture 'LlamaForCausalLM'
ERROR 11-01 02:57:22 registry.py:270] Traceback (most recent call last):
ERROR 11-01 02:57:22 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 432, in _run_in_subprocess
ERROR 11-01 02:57:22 registry.py:270]     returned.check_returncode()
ERROR 11-01 02:57:22 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 460, in check_returncode
ERROR 11-01 02:57:22 registry.py:270]     raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 11-01 02:57:22 registry.py:270] subprocess.CalledProcessError: Command '['/opt/conda/envs/py_3.9/bin/python', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGABRT: 6>.
ERROR 11-01 02:57:22 registry.py:270]
ERROR 11-01 02:57:22 registry.py:270] The above exception was the direct cause of the following exception:
ERROR 11-01 02:57:22 registry.py:270]
ERROR 11-01 02:57:22 registry.py:270] Traceback (most recent call last):
ERROR 11-01 02:57:22 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 268, in _try_inspect_model_cls
ERROR 11-01 02:57:22 registry.py:270]     return model.inspect_model_cls()
ERROR 11-01 02:57:22 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 230, in inspect_model_cls
ERROR 11-01 02:57:22 registry.py:270]     return _run_in_subprocess(
ERROR 11-01 02:57:22 registry.py:270]   File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 435, in _run_in_subprocess
ERROR 11-01 02:57:22 registry.py:270]     raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 11-01 02:57:22 registry.py:270] RuntimeError: Error raised in subprocess:
ERROR 11-01 02:57:22 registry.py:270] rpd_tracer, because
ERROR 11-01 02:57:22 registry.py:270] /opt/conda/envs/py_3.9/lib/python3.9/runpy.py:127: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 11-01 02:57:22 registry.py:270]   warn(RuntimeWarning(msg))
ERROR 11-01 02:57:22 registry.py:270] rocpd_op: 0
ERROR 11-01 02:57:22 registry.py:270] rocpd_api_ops: 0
ERROR 11-01 02:57:22 registry.py:270] rocpd_kernelapi: 0
ERROR 11-01 02:57:22 registry.py:270] rocpd_copyapi: 0
ERROR 11-01 02:57:22 registry.py:270] rocpd_api: 0
ERROR 11-01 02:57:22 registry.py:270] rocpd_string: 0
ERROR 11-01 02:57:22 registry.py:270] rpd_tracer: finalized in 9.814805 ms
ERROR 11-01 02:57:22 registry.py:270] double free or corruption (!prev)
ERROR 11-01 02:57:22 registry.py:270]
Traceback (most recent call last):
  File "/app/vllm/benchmarks/benchmark_latency.py", line 286, in <module>
    main(args)
  File "/app/vllm/benchmarks/benchmark_latency.py", line 24, in main
    llm = LLM(
  File "vllm/utils.py", line 1181, in vllm.utils.deprecate_args.wrapper.inner
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 193, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "vllm/engine/llm_engine.py", line 571, in vllm.engine.llm_engine.LLMEngine.from_engine_args
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 918, in create_engine_config
    model_config = self.create_model_config()
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 853, in create_model_config
    return ModelConfig(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 210, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/config.py", line 233, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 390, in is_multimodal_model
    return self.inspect_model_cls(architectures).supports_multimodal
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 359, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm/model_executor/models/registry.py", line 320, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'Grok1ModelForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'BertModel', 'Gemma2Model', 'MistralModel', 'Qwen2ForRewardModel', 'Phi3VForCausalLM', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2VLForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel']
rocpd_op: 0
rocpd_api_ops: 0
rocpd_kernelapi: 0
rocpd_copyapi: 0
rocpd_api: 0
rocpd_string: 0
rpd_tracer: finalized in 6.490411 ms
double free or corruption (!prev)
/usr/local/bin/runTracer.sh: line 42:   165 Aborted                 LD_PRELOAD=librpd_tracer.so "$@"

ROCm / rocmProfileData