ROCm / ROCm

AMD ROCm™ Software - GitHub Home
https://rocm.docs.amd.com
MIT License
4.36k stars 358 forks source link

[Documentation]: please keep a single source of truth for "Known Issues". Inconsistency found. #2920

Open ye-luo opened 5 months ago

ye-luo commented 5 months ago

Description of errors

I hope there is only one team making the release of ROCm. There is a section in https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-23-40-rocm-6-0-2

Known Issues

    Running PyTorch with iGPU enabled + Discrete GPU enabled may cause crashes. See the Limitations section within the How To Guide for details.
    GPU reset may occur when running multiple heavy Machine Learning workloads at same time over an extended period of time.
    Intermittent gpureset errors may be seen with Automatic 1111 webUI with IOMMU enabled. Please see https://community.amd.com/t5/knowledge-base/tkb-p/amd-rocm-tkb for suggested resolutions.
    RX 7900 GRE may exhibit a hang rather than Out Of Memory error on BERT FP32 training loads.
    Soft hang observed when running multi-queue workloads.

None of these were mentioned at https://github.com/ROCm/ROCm/blob/develop/CHANGELOG.md

Could you please improve the accuracy and consistency of info in release notes?

Attach any links, screenshots, or additional evidence you think will be helpful.

No response

nartmada commented 5 months ago

@ye-luo, thank you for catching this error. I will forward your info to the internal team.