Closed misos1 closed 1 year ago
Thanks for pointing out this issue.
The reason that KFD cannot return a doorbell pointer is, because it doesn't control virtual address mappings. The virtual address space in ROCm is a shared virtual address space between the CPU and all GPUs. It's managed in user mode using the mmap system call. When the queue is created, KFD only decides the offset of this queue within the per-process doorbell map. Those are the lower bits of the doorbell address.
The mapping of doorbells is delayed to the first queue creation because that avoids allocating doorbells unnecessarily for processes that don't use any queues. This is particularly important on multi-GPU systems where we don't want every process to allocate doorbells on all GPUs when they are only using one of the GPUs.
There is a bit of a chicken and egg problem for the first queue. I think the easiest fix is to call get_doorbell_map_info before calculating the doorbell_mmap_offset.
I think the easiest fix is to call get_doorbell_map_info before calculating the doorbell_mmap_offset.
That would not play well with this check: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/026fae434a141faa10da109a2c1b03dc9e06db3f/src/queues.c#L217
There is a bit of a chicken and egg problem for the first queue.
This size is actually a constant based on the gpu version which is known at hsaKmtCreateQueue
. What would be wrong with just DOORBELLS_PAGE_SIZE(DOORBELL_SIZE(q->gfxv))
?
That check is there to prevent mapping the doorbells twice. It could be replaced with a check for doorbells[NodeId].mapping. Using DOORBELLS_PAGE_SIZE would work too.
Yes, of course. But you would need to set doorbells[NodeId].size
with locked mutex doorbells[NodeId].mutex
to avoid UB. Actually as I am thinking about that also accessing doorbells[NodeId].size
here is UB:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/026fae434a141faa10da109a2c1b03dc9e06db3f/src/queues.c#L574-L577
Confirmed to be resolved in ROCm 5.5 (next release).
Do you know when ROCM 5.5 will be released?
I'm not a PM, so I'm not allowed to say the planned release dates. But I confirmed that Felix's patch was part of the 5.5 branch that is currently undergoing testing. So you won't be waiting too long.
When is
hsaKmtCreateQueue
called first time for nodedoorbells[NodeId].size
is initialized to zero ininit_process_doorbells
but used here:https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/026fae434a141faa10da109a2c1b03dc9e06db3f/src/queues.c#L574-L577
Only after these lines is
doorbells[NodeId].size
initialized to the right value throughmap_doorbell
inget_doorbell_map_info
. It works just by accident becausedoorbells[NodeId].size
is uint32_t so -1 will be 0xFFFFFFFF which is zero extended into 0x00000000FFFFFFFF and it will work as long as mmap offset bits are not within lower 32 bits. There should be used something likeDOORBELLS_PAGE_SIZE(DOORBELL_SIZE(q->gfxv)) - 1
instead ofdoorbells[NodeId].size - 1
.Also why KFD cannot return a doorbell pointer directly? Why must there be this additional mapping of some weird mmap offset?