ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
181 stars 82 forks source link

[Issue]: Investigate and Fix GPU error with int8 reduced layer models #3298

Open TedThemistokleous opened 1 month ago

TedThemistokleous commented 1 month ago

Problem Description

Seeing GPU fault when running the onnxruntime-inference-examples script using reduced layer bert models during benchmarking.

It appears quantization/calibration steps work and the issue arises during inference.

Collecting tensor data and making histogram ...
Finding optimal threshold for each tensor using percentile algorithm ...
Number of tensors : 66
Number of histogram bins : 2048
Percentile : (0.0010000000000047748,99.999)
Calibration is done. Calibration cache is saved to calibration.json
Int8 Quantization Done with Onnxruntime Quantizer
QDQ model is saved to  ./qdq_model.onnx
Running Inferences
Memory access fault by GPU node-1 (Agent handle: 0x5581f06a9ff0) on address 0x7f04bdc71000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
root@aus-navi3x-02:/works

This is blocking us getting customer data for int8 and int8 fp16 (mixed precision) results

Operating System

Ubuntu 22.04

CPU

Whatever CI is using

GPU

AMD Radeon RX 7900 XT

Other

No response

ROCm Version

ROCm 6.0.0

Steps to Reproduce

Run script from /workspace/onnxruntime-inference-examples/quantization/nlp/bert/migraphx using the --int8 flag

Able to see this across Navi31 and Navi32 cards

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

TedThemistokleous commented 1 month ago

Beeing looking at this with logging to no luck. Switched to debug build and rocgdb so far seeing a free pointer error comming from onnx prior to anything being run.

Start of new stride
Before Inference
free(): invalid pointer

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350295552) at ./nptl/pthread_kill.c:44
warning: 44     ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350295552) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737350295552) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737350295552, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c99476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c7f7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7ce0676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e32b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7cf7cfc in malloc_printerr (str=str@entry=0x7ffff7e30744 "free(): invalid pointer") at ./malloc/malloc.c:5664
#7  0x00007ffff7cf9a44 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at ./malloc/malloc.c:4439
#8  0x00007ffff7cfc453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007ffd6dfdc622 in onnx_for_migraphx::TensorShapeProto_Dimension::clear_value (this=0x55555fe91c10) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:5766
#10 0x00007ffd6dfedc25 in onnx_for_migraphx::TensorShapeProto_Dimension::SharedDtor (this=0x55555fe91c10) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:5744
#11 0x00007ffd6dfdc4f5 in onnx_for_migraphx::TensorShapeProto_Dimension::~TensorShapeProto_Dimension (this=0x55555fe91c10) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:5736
#12 0x00007ffd6dfdc549 in onnx_for_migraphx::TensorShapeProto_Dimension::~TensorShapeProto_Dimension (this=0x55555fe91c10) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:5733
#13 0x00007ffd6e02c898 in google::protobuf::internal::RepeatedPtrFieldBase::DestroyProtos() () from /opt/rocm-6.1.3/lib/../lib/migraphx/lib/libmigraphx_onnx.so.2011000
#14 0x00007ffd6dfede1f in google::protobuf::RepeatedPtrField<onnx_for_migraphx::TensorShapeProto_Dimension>::~RepeatedPtrField (this=0x55555fe91be0) at /workspace/AMDMIGraphX/depends_develop/include/google/protobuf/repeated_ptr_field.h:1266
#15 0x00007ffd6dfdd290 in onnx_for_migraphx::TensorShapeProto::~TensorShapeProto (this=0x55555fe91bd0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:6017
#16 0x00007ffd6dfee35d in onnx_for_migraphx::TypeProto_Tensor::SharedDtor (this=0x55555fe91ba0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:6224
#17 0x00007ffd6dfddc25 in onnx_for_migraphx::TypeProto_Tensor::~TypeProto_Tensor (this=0x55555fe91ba0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:6218
#18 0x00007ffd6dfe133e in onnx_for_migraphx::TypeProto::clear_value (this=0x55555fe91b60) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:7512
#19 0x00007ffd6dfefae5 in onnx_for_migraphx::TypeProto::SharedDtor (this=0x55555fe91b60) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:7493
#20 0x00007ffd6dfe1a65 in onnx_for_migraphx::TypeProto::~TypeProto (this=0x55555fe91b60) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:7485
#21 0x00007ffd6dfe8eed in onnx_for_migraphx::ValueInfoProto::SharedDtor (this=0x55555fe91af0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:1787
#22 0x00007ffd6dfce2a5 in onnx_for_migraphx::ValueInfoProto::~ValueInfoProto (this=0x55555fe91af0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:1779
#23 0x00007ffd6dfce2f9 in onnx_for_migraphx::ValueInfoProto::~ValueInfoProto (this=0x55555fe91af0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:1776
#24 0x00007ffd6e02c898 in google::protobuf::internal::RepeatedPtrFieldBase::DestroyProtos() () from /opt/rocm-6.1.3/lib/../lib/migraphx/lib/libmigraphx_onnx.so.2011000
#25 0x00007ffd6dd6e63a in google::protobuf::RepeatedPtrField<onnx_for_migraphx::ValueInfoProto>::~RepeatedPtrField (this=0x5555606a8a00) at /workspace/AMDMIGraphX/depends_develop/include/google/protobuf/repeated_ptr_field.h:1266
#26 0x00007ffd6dfd5c9d in onnx_for_migraphx::GraphProto::~GraphProto (this=0x5555606a89a0) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:4000
#27 0x00007ffd6dfea8cc in onnx_for_migraphx::ModelProto::SharedDtor (this=0x7fffffffb240) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:2941
#28 0x00007ffd6dfd2435 in onnx_for_migraphx::ModelProto::~ModelProto (this=0x7fffffffb240) at /workspace/AMDMIGraphX/build_develop/src/onnx/onnx.pb.cc:2931
#29 0x00007ffd6dd643b7 in migraphx::version_1::onnx::onnx_parser::parse_from (this=0x7fffffffb560, data=0x7ffc11fe8010, size=230783461) at /workspace/AMDMIGraphX/src/onnx/onnx_parser.cpp:286
#30 0x00007ffd6dd5a7f2 in migraphx::version_1::parse_onnx_from<void const*&, unsigned long&> (options=..., xs=@0x7fffffffb7c8: 230783461, xs=@0x7fffffffb7c8: 230783461) at /workspace/AMDMIGraphX/src/onnx/onnx.cpp:90
#31 0x00007ffd6dd59020 in migraphx::version_1::parse_onnx_buffer (data=0x7ffc11fe8010, size=230783461, options=...) at /workspace/AMDMIGraphX/src/onnx/onnx.cpp:109
#32 0x00007ffdbc3a0cb7 in migraphx_parse_onnx_buffer::$_0::operator() (this=0x7fffffffb960) at /workspace/AMDMIGraphX/src/api/api.cpp:2054
#33 0x00007ffdbc38d94b in migraphx::try_<migraphx_parse_onnx_buffer::$_0> (f=..., output=true) at /workspace/AMDMIGraphX/src/api/api.cpp:69
#34 0x00007ffdbc38d8e8 in migraphx_parse_onnx_buffer (out=0x7fffffffbc90, data=0x7ffc11fe8010, size=230783461, options=0x55555fef7960) at /workspace/AMDMIGraphX/src/api/api.cpp:2050
#35 0x00007ffdc686ece3 in onnxruntime::MIGraphXExecutionProvider::Compile(std::vector<onnxruntime::IExecutionProvider::FusedNodeAndGraph, std::allocator<onnxruntime::IExecutionProvider::FusedNodeAndGraph> > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocator<onnxruntime::NodeComputeInfo> >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}::operator()(void*, OrtApi const*, OrtKernelContext*) const () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/libonnxruntime_providers_migraphx.so
#36 0x00007ffdc68723bb in std::_Function_handler<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*), onnxruntime::MIGraphXExecutionProvider::Compile(std::vector<onnxruntime::IExecutionProvider::FusedNodeAndGraph, std::allocator<onnxruntime::IExecutionProvider::FusedNodeAndGraph> > const&, std::vector<onnxruntime::NodeComputeInfo, std::allocator<onnxruntime::NodeComputeInfo> >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}>::_M_invoke(std::_Any_data const&, void*&&, OrtApi const*&&, OrtKernelContext*&&) ()
   from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/libonnxruntime_providers_migraphx.so
#37 0x00007fffa5166944 in onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#38 0x00007fffa521fd04 in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#39 0x00007fffa5217629 in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#40 0x00007fffa5222ab5 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#41 0x00007fffa521cd93 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#42 0x00007fffa51e6610 in onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection*, bool, onnxruntime::Stream*) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#43 0x00007fffa51e8bab in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#44 0x00007fffa51e8f13 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#45 0x00007fffa48e530f in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) [clone .localalias] () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#46 0x00007fffa48e6a0a in onnxruntime::InferenceSession::Run(OrtRunOptions const&, onnxruntime::IOBinding&) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#47 0x00007fffa48cd5f7 in onnxruntime::InferenceSession::Run(onnxruntime::IOBinding&) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#48 0x00007fffa488bac8 in pybind11::cpp_function::initialize<onnxruntime::python::addObjectMethods(pybind11::module_&, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char
--Type <RET> for more, q to quit, c to continue without paging--f 9
>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>)::{lambda(onnxruntime::python::PyInferenceSession*, onnxruntime::SessionIOBinding&, OrtRunOptions*)#68}, void, onnxruntime::python::PyInferenceSession*, onnxruntime::SessionIOBinding&, OrtRunOptions*, pybind11::name, pybind11::is_method, pybind11::sibling>(onnxruntime::python::addObjectMethods(pybind11::module_&, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>)::{lambda(onnxruntime::python::PyInferenceSession*, onnxruntime::SessionIOBinding&, OrtRunOptions*)#68}&&, void (*)(onnxruntime::python::PyInferenceSession*, onnxruntime::SessionIOBinding&, OrtRunOptions*), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#49 0x00007fffa480587c in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#50 0x00005555556aec9e in ?? ()
TedThemistokleous commented 1 month ago

Related to a data sample which is causing an out of bounds number when using the reduced model. Need to cut out the sample to see if we avoid additional failures.