Closed MarkCogan closed 1 month ago
Any help with determining a missing RPM or component would be helpful. Pretty sure that's our issue, but since we can't do an online install, we're flying a hair blind trying to compare to another system. These are HPC systems, so we got a lot going on.
Hi @MarkCogan, an internal ticket has been created to further investigate this issue. In the future, please report your issues under the ROCm/ROCm repository as it is more frequently monitored.
I think we can close this issue. Part of it was that we needed to fully update the OS / Kernel (among other factors). It seems that going with the latest OS and ROCm repository solved the issue for us. My users are seeing the memory polled as expected now. My sticking point I think was that on install of ROCm 6 there was no warning about the OS Kernel or possible compatibility issues.
Problem Description
Driver version: 60140092 Err (Properties): no error Device name: AMD Instinct MI210 Device arch: gfx90a:sramecc+:xnack- Device pciBusID: 47 Device pciDeviceID: 0 Err (Count): no error Device count: 8 Err (Mem): invalid argument. <-- Total Mem: 20140771 Avail Mem: 140736205064816
Not returning value from hipMemGetInfo(&freeMem, &totalMem)
Operating System
Rocky Linux release 8.7 (Green Obsidian)
CPU
AMD EPYC 7713 64-Core Processor
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.1.0
ROCm Component
HIP, ROCm
Steps to Reproduce
Unable to determine if we're still missing a component / package. System is air-gapped and running a local repository for amdgpu and rocm_6.1. Comparing to other systems is not feasible.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
=====================
HSA System Attributes
=====================
Runtime Version: 1.13 Runtime Ext Version: 1.4 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: NO
==========
HSA Agents
==========
Agent 1
Name: AMD EPYC 7713 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7713 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3720
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32783820(0x1f43dcc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32783820(0x1f43dcc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32783820(0x1f43dcc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info: