Closed frankplow closed 1 year ago
Not sure MSYS2 has valgrind or not, if it has maybe you can use it to do some memory check.
This issue is partially due to the lack of atomic operations for 8-bit types with MSVC winnt.h. Fixing this this will require an upstream change (see patch here) and then changing VVCFrameThread.avails
to a atomic_uint *
.
With these patches + bab47ca applied, the errors are mostly gone except for LTRP_A_ERICSSON_3 – maybe there is something special about this test case after all? I can now reproduce the errors when a single test is run, rather than as a part of the suite and the decoded MD5 is different each time. MSYS2 does not have valgrind unfortunately. I might try generating a VS solution with FFVVS-Project-Generator and debugging with VS – I see how that's handy already!
LTRP_A_ERICSSON_3 since Linux is always passed, it may be related to some invalid read/write too. maybe you can try valgrind on linux for this file. see what's happened.
I see how that's handy already! 😊
The LTRP_A_ERICSSON_3 failure also affects Linux when assembly optimisations are enabled. I have created a new issue #59 for this.
Tried the current code b1c8bd1 with SLICES_A_HUAWEI_3.bit. We can still reproduce it. But every time the mismatch frame is different, even if I use a single thread. Not easy to debug
@nuomi2021 Is that with the memset to fix #26 (like bab47cad2c2b6d78a17765161a853dd2dcd46775)?
Not sure, the memset will impact the thread scheduler. Even if the memset is ok, it does not mean we find the root cause. If we can find a way to reproduce this with sing thread applications. like checkasm, it may help us debug.
@nuomi2021 Sometimes, like here, ffvvc-test / windows/msvc/no asm
fails so I don't think it is related to assembly optimisations.
There seem to be some bitstreams which fail much more frequently than others - maybe we could try identifying these and any similarities between them which may be suspect?
There are multi-slice or multiple-tile clips. But the wired thing is the failed blocks are not at the slice/tile boundary. Pretty hard to find out what's happened. A possible way to isolate the issue in my mind:
Had some time so did a little bit more research on the Windows CI (#52) test failure.
memsetting the entire
lc->sao_buffer
like bab47cad2c2b6d78a17765161a853dd2dcd46775 does not fix the issue, so I don't think the issue is related to #26. With this change, valgrind and clang's address sanitiser don't report any memory issues.I have compiled FFmpeg directly with MSVC/MSYS2 (i.e. not via FFVS-Project-Generator) and the problem is similar so I don't think it's anything to do with the build files. I haven't yet got the gcc/MSYS2 toolchain or MinGW gcc cross-compilation working unfortunately.
I can't get the LTRP_A_ERICSSON_3 failure to reproduce on my machine, so I don't think there's anything special about this test. The tests which fail most frequently on my machine are:
The failures only occur when running tests concurrently, they do not occur when running the tests individually or when running tests using a single thread. I don't know whether this points towards libavcodec/vvc_thread.c at all?
This line is part of what is preventing cross-compilation at the moment. Should it not be testing for compiler usingSee #57 for fix. Don't believe this is related._MSC_VER
or something instead of checking the OS?