Closed BrendanCunningham closed 1 week ago
Two other things I didn't note in my initial report:
CONFIG_DEBUG_KMEMLEAK=y
; the driver code I pointed you to is meant to be built against a distro kernel (e.g. 5.14 in RHEL 9.4). Distro kernels may not be built with CONFIG_DEBUG_KMEMLEAK=y
.I first observed this problem a few weeks back, I narrowed the likely culprit to tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
in amdgpu_vm_update_range()
in the /usr/src/amdgpu-6.8.5-2009582.el9/
on my MI100 nodes.
It looks like that struct is supposed to be freed in amdgpu_vm_tlb_seq_cb()
. When I saw this problem, I noticed that both amdgpu_vm_update_range
and amdgpu_vm_tlb_seq_cb
are in /sys/kernel/tracing/available_filter_functions
.
So I did echo function > /sys/kernel/tracing/current_tracer
, limited the function tracer to just amdgpu_vm_update_range
and amdgpu_vm_tlb_seq_cb
, and ran the reproducer.
After running the reproducer, I saw many occurrences of amdgpu_vm_update_range
but no occurrences of amdgpu_vm_tlb_seq_cb
in /sys/kernel/tracing/trace
for either node.
Hi @BrendanCunningham. Internal ticket has been created to investigate your issue. Thanks!
Hi @BrendanCunningham Thanks for reporting the issue! This is curious for sure, and we will try our best to reproduce it. Meanwhile, a speculation regarding your investigation:
It looks like that struct is supposed to be freed in amdgpu_vm_tlb_seq_cb(). When I saw this problem, I noticed that both amdgpu_vm_update_range and amdgpu_vm_tlb_seq_cb are in /sys/kernel/tracing/available_filter_functions.
So I did echo function > /sys/kernel/tracing/current_tracer, limited the function tracer to just amdgpu_vm_update_range and amdgpu_vm_tlb_seq_cb, and ran the reproducer. After running the reproducer, I saw many occurrences of amdgpu_vm_update_range but no occurrences of amdgpu_vm_tlb_seq_cb in /sys/kernel/tracing/trace for either node.
So how this works is that at here tlb_cb
is passed to amdgpu_vm_tlb_flush
, then immediately set to NULL afterwards. However, amdgpu_vm_tlb_flush
doesn't hold on to tlb_cb
either; instead it only passes a reference to its member, &tlb_cb->cb
, to the amdgpu_vm_tlb_seq_cb
function here, which gets executed only when the dma fence get signaled. This means that technically nothing is pointing to tlb_cb
at the end of the scope of amdgpu_vm_update_range
. I am not exactly sure how kmalloc_trace
is keeping track of memory leaks, but if it is doing it by reference counting, then it is likely going to set a false alarm at that point. My suggestion would be to run a longer test and see if there's any actual memory consumption building up over time.
Hope this helps. Thanks!
Hi @BrendanCunningham, seems like we are unable to reproduce your issue at the moment . Have you had a chance to run a longer test? Thanks!
No, I haven't run a longer test yet.
Hi @BrendanCunningham, I will be closing this issues for now due to inactivity. Please feel-free to reopen/post follow ups whenever you are ready. Thanks!
Problem Description
After running
osu_bibw -m 256:256 D D
across two nodes, I see "unreferenced object" memory leak reports like so:In
/sys/kernel/debug/kmemleak
on both nodes. There are 2887 of these reports on one node and 2945 reports on the other node.On the node that has 2887 "unreferenced object" reports, there are 2887 occurrences of
amdgpu_vm_update_range
in the kmemleak output.On the other node that has 2940 "unreferenced object" reports, there are 2940 occurences of
amdgpu_vm_update_range
in the kmemleak output. The other 5 reports trace throughnfs
code and are all 16 bytes in size (size 16
).All of the other "unreferenced object" reports between the two nodes are 32 bytes in size (
size 32
).I have not gone through every report but, given that the number of occurrences of
amdgpu_vm_update_range
matches the number ofunreferenced object...(size 32):
reports on both nodes, I strongly suspect that this is a repeating leak or small variants thereof.This is running
osu_bibw D D
with Open MPI on top of a driver for our HPC interconnect card with support for sending packets generated from ROCm buffers using a DMA engine. No calls from our driver (hfi1
) appear in either kmemleak report. There are nearly as many (2881 and 2935) occurrences ofkfd_ioctl+
in both kmemleak files as there areunreferenced object
occurrences so I suspect that these leaks are occurring under an ioctl from ROCm userspace intoamdgpu
.These leaks do not seem to affect the stability or functionality of the system but I am doing short tests, one benchmark every few minutes to every few hours.
Operating System
Red hat Enterprise Linux 9.4 (Plow)
CPU
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
GPU
AMD Instinct MI100
ROCm Version
ROCm 6.2.0
ROCm Component
No response
Steps to Reproduce
Hardware prerequisites:
Software prerequisites:
echo clear > /sys/kernel/debug/kmemleak
on both nodes.osu_bibw -m 256:256 D D
across two nodes.echo scan > /sys/kernel/debug/kmemleak
on both nodes.cat /sys/kernel/debug/kmemleak > kmemleak-$(hostname)-256.txt
.dmesg -wT
to monitor for when kmemleaks have been detected with a message like so:[Tue Oct 1 17:09:58 2024] kmemleak: 2980 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
This may take a few minutes.grep -Ec '^unreferenced object' kmemleak-*256.txt
; make note of number of hits from each file.grep -c amdgpu_vm_update_range kmemleak-*-256.txt
; note number of hits from each file and compare to hits from same file in step 6.(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
kmemleak reports