ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
50 stars 12 forks source link

Windows multithreading test failure #55

Closed frankplow closed 1 year ago

frankplow commented 1 year ago

Had some time so did a little bit more research on the Windows CI (#52) test failure.

memsetting the entire lc->sao_buffer like bab47cad2c2b6d78a17765161a853dd2dcd46775 does not fix the issue, so I don't think the issue is related to #26. With this change, valgrind and clang's address sanitiser don't report any memory issues.

I have compiled FFmpeg directly with MSVC/MSYS2 (i.e. not via FFVS-Project-Generator) and the problem is similar so I don't think it's anything to do with the build files. I haven't yet got the gcc/MSYS2 toolchain or MinGW gcc cross-compilation working unfortunately.

I can't get the LTRP_A_ERICSSON_3 failure to reproduce on my machine, so I don't think there's anything special about this test. The tests which fail most frequently on my machine are:

The failures only occur when running tests concurrently, they do not occur when running the tests individually or when running tests using a single thread. I don't know whether this points towards libavcodec/vvc_thread.c at all? This line is part of what is preventing cross-compilation at the moment. Should it not be testing for compiler using _MSC_VER or something instead of checking the OS? See #57 for fix. Don't believe this is related.

nuomi2021 commented 1 year ago

Not sure MSYS2 has valgrind or not, if it has maybe you can use it to do some memory check.

frankplow commented 1 year ago

This issue is partially due to the lack of atomic operations for 8-bit types with MSVC winnt.h. Fixing this this will require an upstream change (see patch here) and then changing VVCFrameThread.avails to a atomic_uint *.

With these patches + bab47ca applied, the errors are mostly gone except for LTRP_A_ERICSSON_3 – maybe there is something special about this test case after all? I can now reproduce the errors when a single test is run, rather than as a part of the suite and the decoded MD5 is different each time. MSYS2 does not have valgrind unfortunately. I might try generating a VS solution with FFVVS-Project-Generator and debugging with VS – I see how that's handy already!

nuomi2021 commented 1 year ago

LTRP_A_ERICSSON_3 since Linux is always passed, it may be related to some invalid read/write too. maybe you can try valgrind on linux for this file. see what's happened.

I see how that's handy already! 😊

frankplow commented 1 year ago

The LTRP_A_ERICSSON_3 failure also affects Linux when assembly optimisations are enabled. I have created a new issue #59 for this.

nuomi2021 commented 1 year ago

image Tried the current code b1c8bd1 with SLICES_A_HUAWEI_3.bit. We can still reproduce it. But every time the mismatch frame is different, even if I use a single thread. Not easy to debug

frankplow commented 1 year ago

@nuomi2021 Is that with the memset to fix #26 (like bab47cad2c2b6d78a17765161a853dd2dcd46775)?

nuomi2021 commented 1 year ago

Not sure, the memset will impact the thread scheduler. Even if the memset is ok, it does not mean we find the root cause. If we can find a way to reproduce this with sing thread applications. like checkasm, it may help us debug.

frankplow commented 1 year ago

@nuomi2021 Sometimes, like here, ffvvc-test / windows/msvc/no asm fails so I don't think it is related to assembly optimisations.

There seem to be some bitstreams which fail much more frequently than others - maybe we could try identifying these and any similarities between them which may be suspect?

nuomi2021 commented 1 year ago

There are multi-slice or multiple-tile clips. But the wired thing is the failed blocks are not at the slice/tile boundary. Pretty hard to find out what's happened. A possible way to isolate the issue in my mind:

  1. check fail history, put all fail-prone clips into a tmp directory.
  2. set s->nb_fcs to 1 to disalbe thread.
  3. run "ffmpeg.py tmp"
  4. if it failed. it's maybe not mulitthread issue
  5. try to run https://rr-project.org/ to capture datas
  6. replay rr record to debug.