Open littlewu2508 opened 8 months ago
Hi @littlewu2508 can you please rerun the failing tests with logging enabled e.g export AMD_LOG_LEVEL=4
and paste the output.
export AMD_LOG_LEVEL=4
I see this error in the logs from hipIpcGetMemHandle: _Failed to create memory for IPC, failed with hsastatus: 4097
Can you please confirm that the system has an up-to date version of the kernel mode driver (amdgpu-dkms) that should at least match rocm/5.7.1 release.
I see this error in the logs from hipIpcGetMemHandle: _Failed to create memory for IPC, failed with hsastatus: 4097
Can you please confirm that the system has an up-to date version of the kernel mode driver (amdgpu-dkms) that should at least match rocm/5.7.1 release.
The system uses the amdgpu driver from upstream Linux kernel, version 6.1.99. No amdgpu-dkms is installed.
Hello, was this issue ever resolved? I am seeing a similar issue on ROCM 6.0.0 with MI250X GPUs.
Driver version is 6.3.6
@littlewu2508 and @natshineman Did you try with Ubuntu or other supported OS https://rocm.docs.amd.com/en/docs-6.2.1/compatibility/compatibility-matrix.html for the hip-test? amdgpu-dkms driver is a better support for these OS.
Problem Description
Running Gentoo hip tests on dual MI100 system result in two tests failure:
Operating System
Gentoo Prefix on upstream Linux kernel 6.1.69-1.1
CPU
AMD EPYC 7702 64-Core Processor
GPU
AMD Instinct MI100
ROCm Version
ROCm 5.7.1
ROCm Component
HIP
Steps to Reproduce
In a fresh Gentoo Linux, with
/etc/portage/package.accept_keywords
/etc/portage/env/test.conf
/etc/portage/package.env/0-test
And run
emerge --verbose hip
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
The full build & test log
build.log.gz
Test details:
LastTest.log.gz