Closed bpickrel closed 1 month ago
There is a bug in the test_nll_loss_fx
. Its hitting L30 (target_size = inp_size[:1] + inp_size[2:]
) even when your input is size 1 which i dont think is intended. Your resulting target vector is of len 3 when it should be 1.
As an aside, there is a lot of conditional logic based on size. pytest.mark.parametrize
is very good when the test case setup is essentially the same for all inputs. In this scenario I highly recommend splitting the 1-dim case into a separate test case to avoid this kind of logic flow error.
Resolved. The line described above isn't an error, but I found other closely related errors in the test.
Problem Description
A new op converter and test lead to a crashing bug in
pytest
. Symptom is a core dump before any of the new debug output appears. Replication code is in branch nll_loss_converter_crash_bugCommit 65ed2666e2ac doesn't display the failure; this displays only the expected test errors.
I suspect memory corruption not directly related to my code change because adding debug code caused the error to come and go erratically. At one point, a stack trace pointed to function
MGXModule.__initialize
in filetorch_migraphx/py/torch_migraphx/fx/mgx_module.py
but can't replicate this now.Operating System
Ubuntu 20.04.6 LTS
CPU
AMD Ryzen Threadripper PRO 3955WX 16-Cores
GPU
AMD Radeon Pro W7900
ROCm Version
ROCm 6.1.0
ROCm Component
No response
Steps to Reproduce
Checkout branch nll_loss_converter_crash_bug cd torch_migraphx/tests pytest -k test_nll_loss_fx
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
This occurred in a docker container.
Output: