ROCm / ROCR-Runtime

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
https://rocm.docs.amd.com/projects/ROCR-Runtime/en/latest/
Other
217 stars 108 forks source link

[Issue]: Iterating Over Executables via the Vendor Loader API Mistakenly Includes Executables Destroyed Previously As `nullptr` #206

Closed matinraayai closed 1 week ago

matinraayai commented 4 months ago

Problem Description

The destruction of an HSA Executable using hsa_executable_destroy will result in the loader deleting the Executable object previously allocated. However, instead of erasing the pointer in the std::vector of Executables currently loaded, it equates it to a nullptr: https://github.com/ROCm/ROCR-Runtime/blob/3f6ffc5b1167a43dc5e169db85655182a4c5947c/src/loader/executable.cpp#L242-L257

After the destruction of any number of Executables, if one invokes the hsa_ven_amd_loader_iterate_executables function to iterate over all executables in HSA, the iteration also returns the nullptr that was placed instead of the destroyed executables:

https://github.com/ROCm/ROCR-Runtime/blob/3f6ffc5b1167a43dc5e169db85655182a4c5947c/src/loader/executable.cpp#L259-L276

This does not seem to be expected behavior. cc: @kzhuravl

Operating System

22.04

CPU

N/A

GPU

AMD Instinct MI100

Other

No response

ROCm Version

ROCm 6.0.0

ROCm Component

ROCR-Runtime

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

ppanchad-amd commented 1 month ago

@matinraayai Internal ticket has been created to investigate this issue. Thanks!

jamesxu2 commented 1 month ago

Hello @matinraayai,

Thanks for the clear issue report! However, the way this code is written is intentional.

If you notice this line: executables[((ExecutableImpl*)executable)->id()] = nullptr; in the DestroyExecutable function, you will see that the executables array is indexed into with the id() of the deleted object. So, if you removed this entry from the array entirely, you would need to reindex the ids of all subsequent entries before the next time this function is used. The entry is instead set to nullptr to avoid this reindexing step.

Does encountering a nullptr when iterating over the executables cause a problem?

matinraayai commented 1 month ago

Hello @matinraayai,

Thanks for the clear issue report! However, the way this code is written is intentional.

If you notice this line: executables[((ExecutableImpl*)executable)->id()] = nullptr; in the DestroyExecutable function, you will see that the executables array is indexed into with the id() of the deleted object. So, if you removed this entry from the array entirely, you would need to reindex the ids of all subsequent entries before the next time this function is used. The entry is instead set to nullptr to avoid this reindexing step.

Is indexing over a vector the best way to go? It seems a set or an unordered map might do a better job here; The only usages I've seen for executables are either iterations over all entries, or removal of an entry (which the map data structures make easier). But this is an implementation detail.

Does encountering a nullptr when iterating over the executables cause a problem?

I believe so; The vendor loader API doesn't say anywhere that it might return hsa_executable_t{0} and the user has to check to make sure the handles are valid.

It seems all other loader functions that iterate over executables, they check if the iteration element is a nullptr before proceeding; I think the same check should be added to the executable iteration function of the loader API.

jamesxu2 commented 1 month ago

The vendor loader API doesn't say anywhere that it might return hsa_executable_t{0} and the user has to check to make sure the handles are valid. It seems all other loader functions that iterate over executables, they check if the iteration element is a nullptr before proceeding; I think the same check should be added to the executable iteration function of the loader API.

This is a good point, I am checking in with our internal team to see if this nullptr check can be patched in, or some documentation added.

jamesxu2 commented 1 week ago

Hi @matinraayai, a nullptr check has been added for consistency with other loader functions that iterate over executables - you'll see this code change in a future release of ROCm.

Thanks for bringing this up!