AviSynth / AviSynthPlus

AviSynth with improvements
http://avs-plus.net
963 stars 73 forks source link

LLVM (clang-cl, Intel Nextgen) builds crash on pre-AVX CPUs #347

Closed pinterf closed 1 year ago

pinterf commented 1 year ago

As the title says, any attempt to load such Avisynth+ builds from VirtualDub or, avsmeter dies prematurely.

pinterf commented 1 year ago

thanks for @jpsdr for the report, and for @DTL2020 who pointed out on the exact case: "illegal instruction"

Error occured only in release builds.

Problem's reason summary: Such a construct (see below), when appears in an AVX2 source module, will get static initialized before anything can happen in AviSynth. This init routine will get called even on processors having no AVX/AVX2 instruction set. Compiler must initialize their statically declared data (this time it was an array initialization, a table for dithering - e.g. a memory copy from a constant table to fill the class' data_sse2 array)

In convert_bit.h (which was included in convert_bits_avx2.cpp):

static const struct dither2x2a_t
{
  const BYTE data[4] = {
    0, 1,
    1, 0,
  };
  // cycle: 2
  alignas(16) const BYTE data_sse2[2 * 16] = {
    0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
    1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
  };
  dither2x2a_t() {};
} dither2x2a;

The illegal instruction came from dither2x2a_t() {}

which triggered the static initialization. And since

So it crashed on an SSE4 computer.

After installing Visual Studio on an i7-860 (pre-AVX) machine, ReleaseWithDebug build showed the crash disassembly:

AviSynth.dll!_GLOBAL__sub_I_convert_bits_avx2.cpp(void):
00007FFBF987F4E0  mov         dword ptr [dither2x2a (07FFBF9B03620h)],10100h  
00007FFBF987F4EA  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+10h (07FFBF9A3CF60h)]  
00007FFBF987F4F2  vmovups     ymmword ptr [dither2x2a+10h (07FFBF9B03630h)],ymm0  
00007FFBF987F4FA  mov         dword ptr [dither2x2 (07FFBF9B03650h)],1030200h  
00007FFBF987F504  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+30h (07FFBF9A3CF80h)]  
00007FFBF987F50C  vmovups     ymmword ptr [dither2x2+10h (07FFBF9B03660h)],ymm0  
00007FFBF987F514  vmovaps     xmm0,xmmword ptr [__xmm@02060307040005010307020605010400 (07FFBF991D470h)]  
00007FFBF987F51C  vmovaps     xmmword ptr [dither4x4a (07FFBF9B03680h)],xmm0  
00007FFBF987F524  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+50h (07FFBF9A3CFA0h)]  
00007FFBF987F52C  vmovups     ymmword ptr [dither4x4a+10h (07FFBF9B03690h)],ymm0  
00007FFBF987F534  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+70h (07FFBF9A3CFC0h)]  
etc.

This helped to locate the exact position of the problem.

So we have to take care not using static class inititialing in the AVX2 (in general: other than the minimal CPU arch) compiled source.

Such predefined tables are now using extern const in header, and extern const + definition once, in the base-CPU-arch compiled module: convert_bits.cpp. convert_bits_sse.cpp and convert_bits_avx2.cpp are using them without any duplication and initialization need.

When I generated an asm list output, I could see any more one_only,_GLOBAL__sub_I_ + fn_name occurence in *_avx2.asm files.

pinterf commented 1 year ago

I reopen it, since this is a blind fix, until I get feedback or I can try it after I get to my ancient i7-860 PC again in some days.

pinterf commented 1 year ago

Feedback received. https://forum.doom9.org/showthread.php?p=1984827#post1984827 Closing it. I learned a lot.