Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
133 stars 23 forks source link

View shared memory usage on laptop Ryzen iGPU #114

Closed davidmi closed 3 years ago

davidmi commented 3 years ago

Hi, thank you so much for writing gpu-utils!This is the only monitoring utility I could find that worked for my Ryzen iGPU on Linux.

I'm sorry if this is not the right place to ask a question, but I thought asking it here might help others if they encounter the same thing.

I'm running a Ryzen 4000 series dual-booting Ubuntu laptop, with an integrated Vega GPU. The iGPU has 512 MB dedicated vram, but it also uses up to 4 GB of shared system memory. On Windows 10, I can see the usage of the shared memory in Task Manager, but in gpu-mon it only reports the usage of the dedicated vram. I am running with compositing disabled on KDE, so it's using very little VRAM -- when I run on GNOME 2 with compositing enabled, it is always at 97% or so -- the excess is not reported.

┌─────────────┬────────────────┐
│Card #       │card0           │
├─────────────┼────────────────┤
│Model        │Renoir          │
│GPU Load %   │0               │
│Mem Load %   │None            │
│VRAM Usage % │12.617          │
│GTT Usage %  │3.619           │
│Power (W)    │None            │
│Power Cap (W)│None            │
│Energy (kWh) │0.0             │
│T (C)        │46.0            │
│VddGFX (mV)  │nan             │
│Fan Spd (%)  │None            │
│Sclk (MHz)   │400             │
│Sclk Pstate  │1               │
│Mclk (MHz)   │1200Mhz         │
│Mclk Pstate  │1               │
│Perf Mode    │                │
└─────────────┴────────────────┘

Is there any way to see the amount of shared video memory being used?

Once again, thank you for this really useful set of tools!

Ricks-Lab commented 3 years ago

Can you provide the output of gpu-ls? It provides total amount of GTT memory and VRAM. Are either of those what you are looking for?

I have never tried a Renoir GPU. Can you run gpu-ls --debug and post the log file? That would help me understand if there any additional sensors that I have not considered.

Drizzt321 commented 3 years ago

My 4800H Renoir, in KDE Plasma (via their Neon Ubuntu 20.04 based distro). The debug log file is at Gist https://gist.github.com/Drizzt321/8abaf6bbbc39d1b1031fedd7f6b1dc2d Happy to run any other debug/dump if you need more info.

# gpu-mon 
┌─────────────┬────────────────┐
│Card #       │card0           │
├─────────────┼────────────────┤
│Model        │Renoir          │
│GPU Load %   │0               │
│Mem Load %   │None            │
│VRAM Usage % │81.297          │
│GTT Usage %  │7.462           │
│Power (W)    │None            │
│Power Cap (W)│None            │
│Energy (kWh) │0.0             │
│T (C)        │40.0            │
│VddGFX (mV)  │nan             │
│Fan Spd (%)  │None            │
│Sclk (MHz)   │0               │
│Sclk Pstate  │1               │
│Mclk (MHz)   │1600Mhz         │
│Mclk Pstate  │0               │
│Perf Mode    │                │
└─────────────┴────────────────┘

# gpu-ls --debug
Neon: Unverified
Package addon [clinfo] executable not found.  Use sudo apt-get install clinfo to install
OS Command [clinfo] not found.  Use sudo apt-get install clinfo to install
Detected GPUs: AMD: 1
Can not access package read utility to verify AMD driver.
AMD: Wattman features not enabled: 0xffffbfff, See README file.
Warning: Can not read parameter: mem_loading, disabling for this GPU: 0
Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0
Warning: Can not read parameter: power, disabling for this GPU: 0
Warning: Can not read parameter: power_cap, disabling for this GPU: 0
Warning: Can not read parameter: voltages, disabling for this GPU: 0
1 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0
   Vendor: AMD
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1636', 'subsystem_device': '0x109f', 'subsystem_vendor': '0x1d05', 'vendor': '0x1002'}
   Decoded Device ID: Renoir
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c6)
   Display Card Model: Renoir
   PCIe ID: 04:00.0
      Link Speed: 16.0 GT/s PCIe
      Link Width: 16
   ##################################################
   Driver: amdgpu
   vBIOS Version: 113-RENOIR-026
   Compute Platform: None
   GPU Type: APU
   HWmon: /sys/class/drm/card0/device/hwmon/hwmon4
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
   ##################################################
   Current GPU Loading (%): 0
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): 7.002
      Current GTT Memory Used (GB): 0.210
      Total GTT Memory (GB): 3.000
   Current VRAM Usage (%): 81.073
      Current VRAM Used (GB): 0.405
      Total VRAM (GB): 0.500
   Current  Temps (C): {'edge': 43.0}
   Critical Temps (C): {'edge': 0.0}
   Current Voltages (V): None
   Current Clk Frequencies (MHz): {'sclk': 400.0}
   Current SCLK P-State: [1, '400Mhz']
   Current MCLK P-State: [0, '1600Mhz']
   Power Profile Mode: None
   Power DPM Force Performance Level: auto

# uname -a
Linux darklaptop 5.8.0-45-generic #51~20.04.1-Ubuntu SMP Tue Feb 23 13:46:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/cmdline 
BOOT_IMAGE=/BOOT/ubuntu_1lc8c8@/vmlinuz-5.8.0-45-generic root=ZFS=rpool/ROOT/ubuntu_1lc8c8 ro
davidmi commented 3 years ago

Thank you for suggesting gpu-ls, here are the results. This is a Ryzen 4500u.

$ sudo gpu-ls
Detected GPUs: AMD: 1
amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xffffbfff, See README file.
Warning: Error reading parameter: mem_loading, disabling for this GPU: 0
Warning: Error reading parameter: power_cap_range, disabling for this GPU: 0
Warning: Error reading parameter: power, disabling for this GPU: 0
Warning: Error reading parameter: power_cap, disabling for this GPU: 0
Warning: Error reading parameter: voltages, disabling for this GPU: 0
1 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0
   Vendor: AMD
   Readable: True
   Writable: False
   Compute: False
   GPU UID: None
   Device ID: {'device': '0x1636', 'subsystem_device': '0x0a1e', 'subsystem_vendor': '0x1028', 'vendor': '0x1002'}
   Decoded Device ID: Renoir
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c3)
   Display Card Model: Renoir
   PCIe ID: 03:00.0
      Link Speed: 16.0 GT/s PCIe
      Link Width: 16
   ##################################################
   Driver: amdgpu
   vBIOS Version: 113-RENOIR-025
   Compute Platform: None
   GPU Type: APU
   HWmon: /sys/class/drm/card0/device/hwmon/hwmon4
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:03:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
   ##################################################
   Current GPU Loading (%): 0
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): 1.273
      Current GTT Memory Used (GB): 0.089
      Total GTT Memory (GB): 7.000
   Current VRAM Usage (%): 30.291
      Current VRAM Used (GB): 0.151
      Total VRAM (GB): 0.500
   Current  Temps (C): {'edge': 48.0}
   Critical Temps (C): {'edge': 0.0}
   Current Voltages (V): None
   Current Clk Frequencies (MHz): {'sclk': 400.0}
   Current SCLK P-State: [1, '400Mhz']
   Current MCLK P-State: [3, '400Mhz']
   Power Profile Mode: None
   Power DPM Force Performance Level: auto

Looks like what I was looking for is the GTT memory usage! It would be helpful to have that in gpu-mon, I think, since that's the "real" total VRAM.

I've manually overridden that in modprobe now using gttsize since the amdgpu driver calculates it incorrectly. I found the problem in another way, but it would be useful if gpu-mon made it visible. I was investigating why I seemed to have less VRAM on Linux than Windows, and it looks like that was the problem. I overrode the gttsize parameter and it seems to have worked https://www.kernel.org/doc/html/v4.19/gpu/amdgpu.html

davidmi commented 3 years ago

Looking again, I see that gpu-mon does report GTT! So it's my fault for missing it/not understanding what it meant, apologies. Hopefully this thread can help anyone who is similarly confused.

Ricks-Lab commented 3 years ago

Cool! I will go ahead and close the issue. You can reopen if needed.