Closed Austin-Hu closed 5 years ago
When running lldb -- ./SvtHevcEncApp -i ../akiyo_cif.y4m
Encoding Process 1509 stopped
* thread #3, name = 'SvtHevcEncApp', stop reason = signal SIGILL: illegal instruction operand
frame #0: 0x00007ffff79d7175 libSvtHevcEnc.so.1`IntraModeVerticalChroma_AVX2_INTRIN [inlined] _mm_loadu_si128(__P=0x0000555561aa8531) at emmintrin.h:703:10
700 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
701 _mm_loadu_si128 (__m128i_u const *__P)
702 {
-> 703 return *__P;
704 }
705
706 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
@1480c1 , according to IntrinsicGuide, _mm_loadu_si128 should be an SSE2 intrinsic which should be supported on your X3430 platform.
So could you help check:
Thanks!
- whether mm_loadu_si128 is also invoked or not, before IntraModeVerticalChroma_AVX2_INTRIN() is triggered?
It seems yes. It ran in IntraModeHorizontalLuma_SSE2_INTRIN
and CombinedAveraging32xMSAD_SSE2_INTRIN
2. just dereference the pointer of "refSamples + topOffset" directly before mm_loadu_si128 in IntraModeVerticalChroma_AVX2_INTRIN()?
Can you elaborate on what do you mean by this?
- whether mm_loadu_si128 is also invoked or not, before IntraModeVerticalChroma_AVX2_INTRIN() is triggered?
It seems yes. It ran in
IntraModeHorizontalLuma_SSE2_INTRIN
andCombinedAveraging32xMSAD_SSE2_INTRIN
It's weird, because the intrinsic can execute in those functions you mentioned successfully before the crashing point...
Because we don't have such CPU to debug in hand, or we'd try to find one in some day. And could you help debug this issue on your X3430 platform, if you have more bandwidth, @1480c1 ? For example, try to load refSamples (without topOffset added) with _mm_loadu_si128, or try to load with other intrinsic (such as _mm_loadu_si64) to see any difference. Thanks!
- just dereference the pointer of "refSamples + topOffset" directly before mm_loadu_si128 in IntraModeVerticalChroma_AVX2_INTRIN()?
Can you elaborate on what do you mean by this?
I wanted you to just print the value of *(refSamples + topOffset) directly, but ignore it right now due to your verification for the 1st question.
Because we don't have such CPU to debug in hand, or we'd try to find one in some day. And could you help debug this issue on your X3430 platform, if you have more bandwidth, @1480c1 ? For example, try to load refSamples (without topOffset added) with _mm_loadu_si128, or try to load with other intrinsic (such as _mm_loadu_si64) to see any difference. Thanks!
Sorry for not seeing this, since it's an edit, it didn't notify anything.
I can try, however, I'm not proficient in understanding simd instructions atm.
Would it be preferable to have a way for you to ssh into it?
Because we don't have such CPU to debug in hand, or we'd try to find one in some day. And could you help debug this issue on your X3430 platform, if you have more bandwidth, @1480c1 ? For example, try to load refSamples (without topOffset added) with _mm_loadu_si128, or try to load with other intrinsic (such as _mm_loadu_si64) to see any difference. Thanks!
Sorry for not seeing this, since it's an edit, it didn't notify anything.
I can try, however, I'm not proficient in understanding simd instructions atm.
Would it be preferable to have a way for you to ssh into it?
@1480c1 , sorry, I should have added my comments by appending a new one, rather than editting.
I'm not proficient but I'd like to try to after completing my current tasks. Could you share me your SSH info, and are you using Linux or Windows? Thanks!
I can setup an ssh account on the machine, it's running Ubuntu 19.04. currently about to have an exam, so will not be able to do it immediately
I can setup an ssh account on the machine, it's running Ubuntu 19.04. currently about to have an exam, so will not be able to do it immediately
OK, let me try to log in it.
I sent you an email to the one in the commit with the information
Hi @1480c1 ,
As for the latter "_mm_loadu_si128" crashing issue, I think its root cause is that, in the implementation of some (or most) kernel threads, using AVX and SSE intrinsics were mixed together, such as function IntraModeVerticalChroma_AVX2_INTRIN(). So with your X3430 CPU without AVX intrinsic, the IntraVerticalChroma_funcPtrArray function pointer array would point to the IntraModeVerticalChroma_AVX2_INTRIN() code with PREAVX2_MASK, and would go through the section using AVX intrinsic (the 256bit version, or the 128bit one compiled with -mavx2 gcc option) which it shouldn't have gone.
The way to address this issue, is to separate the SSE/AVX intrinsic usings into different files, compiled with corresponding gcc x86 options. And the most ideal way I was considering is to use the "CPU dispatch" mechanism mentioned in Section 13.5 of "Optimizing Software in C++". But it would not be feasible for the SVT encoder project, because the function pointer of most kernel threads changes during runtime. For example, using either SSE or AVX in IntraModeVerticalChroma_AVX2_INTRIN() depends on the Chroma PU size, rather than being set during encoder initialization stage.
So I submitted an experimental commit, to prove that the file using SSE and being compiled with -mavx2 would cause the crashing issue in your CPU withough AVX supported. And with the commit applied, the "Illegal instruction" issue wouldn't happen in IntraModeVerticalChroma_AVX2_INTRIN(), but it occurred again in another following function IntraModeDCChroma_AVX2_INTRIN().
In order to fully resolve this issue, we need too much similar seperation (physical effort actually) for all the implementations of kernel threads which mixed different intrinsics, not only for IntraModeVerticalChroma_AVX2_INTRIN(). So let's set it as low priority and refine those codes in the future to support legacy CPUs.
Thanks!
By checking whether the processor supports XGETBV instruction or not, beforeing invoking it to check ZMM/YMM/XMM states are enabled in XCR0 register.
Signed-off-by: Austin Hu austin.hu@intel.com