Open bertwesarg opened 5 months ago
Problem Description
6.0.1-112
includes this change to therocm_smi.h
header:@@ -1091,6 +1099,19 @@ uint16_t current_vclk0s[RSMI_MAX_NUM_CLKS]; uint16_t current_dclk0s[RSMI_MAX_NUM_CLKS]; + /* + * v1.5 additions + */ + // JPEG activity percent (encode/decode) + uint16_t jpeg_activity[RSMI_MAX_NUM_JPEG_ENGS]; + + // PCIE NAK sent accumulated count + uint32_t pcie_nak_sent_count_acc; + + // PCIE NAK received accumulated count + uint32_t pcie_nak_rcvd_count_acc; + + /// \endcond } rsmi_gpu_metrics_t;
but this change of the size of type
rsmi_gpu_metrics_t
breaks the ABI to 6.0.0. And currently bothlibrocm_smi.so
libs have the same interface version6
.
Are you referring specifically to the fact the rsmi_gpu_metrics_t
size changed? Or that the ABI versioning still needs to be updated to reflect the change? Like 6.0.1
, for example?
Are you referring specifically to the fact the rsmi_gpu_metrics_t size changed?
Correct.
Like 6.0.1, for example?
This is a particular bad example in this case, as it does not tell the linker/runtime loader, that librocm_smi64.so.6.0.60001
is not compatible to librocm_smi64.so.6.0.60000
. They both still have the same SONAME
which is librocm_smi64.so.6
, and binaries/libraries only record the SONAME in there dependencies (DT_NEEDED
)
Are you referring specifically to the fact the rsmi_gpu_metrics_t size changed?
Correct.
Like 6.0.1, for example?
This is a particular bad example in this case, as it does not tell the linker/runtime loader, that
librocm_smi64.so.6.0.60001
is not compatible tolibrocm_smi64.so.6.0.60000
. They both still have the sameSONAME
which islibrocm_smi64.so.6
, and binaries/libraries only record the SONAME in there dependencies (DT_NEEDED
)
That's a great point. We expect the rsmi_gpu_metric_t
type size to change when new elements are added (which is happening more often due to requirements), but we need to change the SONAME
to reflect the change and not break the existing ABI. As it follows the ROCm versioning, let me check on that so we can address it accordingly.
Question: Did it break your application? or are you just saying it should have a different version to identify the change? Thank you for bringing that up!
I'm just reading the code. There are at least two possible ways I know to prevent changing the soname every time a struct changes its size:
I think its too late for option 2., but 1. should still be able to add, but it wont avoid the increment in this bugfix
Problem Description
6.0.1-112
includes this change to therocm_smi.h
header:but this change of the size of type
rsmi_gpu_metrics_t
breaks the ABI to 6.0.0. And currently bothlibrocm_smi.so
libs have the same interface version6
.