animetosho / par2cmdline-turbo

par2cmdline × ParPar: speed focused par2cmdline fork
http://parchive.sourceforge.net
GNU General Public License v2.0
91 stars 4 forks source link

Crash on repair #13

Closed Safihre closed 1 year ago

Safihre commented 1 year ago

This is one of our test-NZB's that we use within SABnzbd that has a unicode filename and needs repair. https://sabnzbd.org/tests/unicode_rar_broken.nzb

When I try that on my aarch64 NAS system after I got it to compile earlier today, it crashes:

Loading "你好世界.vol006+08.par2".
Loaded 32 new packets including 8 recovery blocks
Loading "你好世界.vol002+04.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "你好世界.vol126+73.par2".
Loaded 73 new packets including 73 recovery blocks
Loading "你好世界.vol000+02.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "你好世界.vol014+16.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "你好世界.vol062+64.par2".
Loaded 64 new packets including 64 recovery blocks
Loading "你好世界.vol030+32.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "你好世界.par2".
No new packets found
There are 11 recoverable files and 0 other files.
The block size used was 5272 bytes.
There are a total of 1991 data blocks.
The total size of the data files is 10486893 bytes.
Verifying source files:
Opening: "你好世界.part01.rar"
Opening: "你好世界.part02.rar"
Target: "你好世界.part01.rar" - found.
Target: "你好世界.part02.rar" - found.
Opening: "你好世界.part03.rar"
Opening: "你好世界.part04.rar"
Target: "你好世界.part04.rar" - found.
Target: "你好世界.part03.rar" - found.
Opening: "你好世界.part06.rar"
Opening: "你好世界.part05.rar"
Target: "你好世界.part05.rar" - found.
Target: "你好世界.part06.rar" - found.
Opening: "你好世界.part07.rar"
Opening: "你好世界.part08.rar"
Target: "你好世界.part08.rar" - damaged. Found 136 of 199 data blocks.
Opening: "你好世界.part09.rar"
Target: "你好世界.part07.rar" - found.
Opening: "你好世界.part10.rar"
Target: "你好世界.part09.rar" - found.
Opening: "你好世界.part11.rar"
Target: "你好世界.part11.rar" - found.
Target: "你好世界.part10.rar" - found.
Scanning extra files:
Repair is required.
1 file(s) exist but are damaged.
10 file(s) are ok.
You have 1927 out of 1991 data blocks available.
You have 199 recovery blocks available.
Repair is possible.
You have an excess of 135 recovery blocks.
64 recovery blocks will be used to repair.
Computing Reed Solomon matrix.
terminate called after throwing an instance of 'std::system_error'
what():  Unknown error 605393792
Safihre commented 1 year ago

Tried to run it on Windows, but my command-line and Powershell are unable to handle the filename (puts ????.par2) and even when I run it with a simple Python script that calls subprocess.run() it doesn't work.

par2 file must not have a wildcard in it.
failed to set the main par file
animetosho commented 1 year ago

Can you check whether the issue is present the original par2cmdline? If it is, please report the issue there.

The error on your NAS is odd, though I suspect it has nothing to do with the filename. Out of memory seems like a candidate, though there's some oddities with that too. If you can get a stack trace, it could reveal more info, though keep in mind that I don't intend to fix bugs in par2cmdline with this fork.

Safihre commented 1 year ago

It works normally on regular par2cmdline.

animetosho commented 1 year ago

I assume you meant for both issues?

I'm not sure what the issue is exactly on your NAS - if you can get a stack dump (run under GDB and use bt command when it crashes) I can investigate more.
(the output suggests an exception was raised during matrix construction, but that code hasn't changed from the original, plus that code doesn't deal with I/O so you'd generally not expect errors there)

I'm unable to get the original par2cmdline to handle non-ASCII command-line parameters on Windows (using the official build). The code doesn't seem to have any provision for Windows Unicode support, so it makes sense to me that it wouldn't work. Also, command-line/file handling is the same as upstream par2cmdline, so it'd surprise me that par2cmdline-turbo is any different here.
Did you build par2cmdline by yourself, or are you using the x64 build supplied here?

Safihre commented 1 year ago

I should have been more clear indeed that the filename problem is also present on regular par2cmdline for Windows. So unrelated!

I will see if I can get gdb working, it's quite complex due to the way Synology fences off all packages.

Safihre commented 1 year ago

I just confirmed that it crashes on any repair, also something that I should have tested right away of course. But now it stopped at a different spot. It's the part01 that has the artificial damage. Just to double check that on Linux systems I don't have a separate OpenMP library file, right?

Loading "sometestfile-100MB.vol031+032.par2".
Loaded 35 new packets including 12 recovery blocks
Loading "sometestfile-100MB.vol003+004.par2".
Loaded 5 new packets including 4 recovery blocks
Loading "sometestfile-100MB.vol001+002.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "sometestfile-100MB.vol063+064.par2".
Loaded 12 new packets including 12 recovery blocks
Loading "sometestfile-100MB.vol007+008.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "sometestfile-100MB.vol000+001.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "sometestfile-100MB.vol015+016.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "sometestfile-100MB.vol127+072.par2".
Loaded 12 new packets including 12 recovery blocks
Loading "sometestfile-100MB.par2".
No new packets found
There are 11 recoverable files and 0 other files.
The block size used was 52696 bytes.
There are a total of 1991 data blocks.
The total size of the data files is 104859345 bytes.
Verifying source files:
Opening: "sometestfile-100MB.part02.rar"
Opening: "sometestfile-100MB.part01.rar"
terminate called after throwing an instance of 'std::system_error'
what():  Unknown error -2015248464
Safihre commented 1 year ago

Apologize for the spam, I'm just not sure what is relevant.

What I tried:

Thread 1 "par2" received signal SIGILL, Illegal instruction. 0x0000000000590174 in ?? ()

- Using `bt` gives:

(gdb) bt

0 0x0000000000590174 in ?? ()

1 0x0000000000450814 in ?? ()

2 0x0000000000450814 in ?? ()

3 0x0000007ffffff810 in ?? ()

4 0x0000007ffffff7b8 in ?? ()

Backtrace stopped: not enough registers or memory available to unwind further

- On the non-ASCII files it crashes in the same way. 

Repair is possible. You have an excess of 197 recovery blocks. 2 recovery blocks will be used to repair.

Thread 1 "par2" received signal SIGILL, Illegal instruction. 0x0000000000590174 in ?? ()

- I then realized that SABnzbd use slightly different parameters so I repeated it with the exact same parameters. Which then gives the error I showed before:

Verifying source files:

[New LWP 29268] Opening: "sometestfile-100MB.part01.rar" Opening: "sometestfile-100MB.part02.rar" [New LWP 29269] [LWP 29269 exited] terminate called after throwing an instance of 'std::system_error' what(): Unknown error -1216757840

Thread 2 "par2" received signal SIGABRT, Aborted. [Switching to LWP 29268] 0x0000000000608da0 in ?? ()

(gdb) bt

0 0x0000000000608da0 in ?? ()

1 0x00000000004003d4 in ?? ()

2 0x0000000000590510 in ?? ()

3 0x00000000005581e4 in ?? ()

4 0x0000000000590384 in ?? ()

5 0x0000000000557b58 in ?? ()

6 0x00000000005f9dfc in ?? ()

7 0x00000000005fa474 in ?? ()

8 0x00000000005583d0 in ?? ()

9 0x0000000000561850 in ?? ()

10 0x000000000058fbbc in ?? ()

11 0x000000000054b2a8 in ?? ()

12 0x000000000054b420 in ?? ()

13 0x000000000054b384 in ?? ()

14 0x000000000054b0f4 in ?? ()

15 0x0000000000422964 in ?? ()

16 0x00000000004208e0 in ?? ()

17 0x0000000000420060 in ?? ()

18 0x0000000000422cf0 in ?? ()

19 0x0000000000420e34 in ?? ()

20 0x00000000004201c4 in ?? ()

21 0x0000000000548508 in ?? ()

22 0x0000000000430904 in ?? ()

23 0x000000000042fc08 in ?? ()

24 0x0000000000433fec in ?? ()

25 0x00000000005e5548 in ?? ()

26 0x00000000005fd5fc in ?? ()

27 0x000000000063d4fc in ?? ()



In any case it seems this is something specific to the Synology supplied toolchain, since your build works..
How do I build with debugging symbols enabled? If that's useful?
animetosho commented 1 year ago

The default build adds and retains symbols I think (I get the -g flag during compile, and there's no strip). At least the supplied Linux builds here seem to have debug symbols.
So perhaps your build environment is stripping out the symbols somewhere. Would this be the build?

Though you're getting SIGILL when an exception isn't thrown, is curious. You can try the x/i $pc command when you get a SIGILL to see if it disassembles to a valid instruction.

The Linux builds here are statically linked, so no external OpenMP. Your builds may be different - you can use ldd par2 to see what it's trying to link to.

Safihre commented 1 year ago

I got the unstripped version, this is the output:

Verifying source files:

[New LWP 12073]
Opening: "sometestfile-100MB.part02.rar"
Opening: "sometestfile-100MB.part01.rar"
[New LWP 12074]
[LWP 12074 exited]
terminate called after throwing an instance of 'std::system_error'
  what():  Unknown error -1216757840
Scanning: 12.3%
Thread 2 "par2" received signal SIGABRT, Aborted.
[Switching to LWP 12073]
raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00000000004003d4 in abort () at abort.c:90
#2  0x0000000000590510 in __gnu_cxx::__verbose_terminate_handler () at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00000000005581e4 in __cxxabiv1::__terminate (handler=<optimized out>) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x0000000000590384 in __cxa_call_terminate (ue_header=ue_header@entry=0x7fb002f030) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x0000000000557b58 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=<optimized out>, ue_header=0x7fb002f030, context=0x7fb7f9b320) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_personality.cc:676
#6  0x00000000005f9dfc in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7fb002f030, context=context@entry=0x7fb7f9b320, frames_p=frames_p@entry=0x7fb7f9b6e0) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libgcc/unwind.inc:64
#7  0x00000000005fa474 in _Unwind_RaiseException (exc=exc@entry=0x7fb002f030) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libgcc/unwind.inc:136
#8  0x00000000005583d0 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fb002f050, tinfo=0x70a8b8 <typeinfo for std::system_error>, dest=0x561548 <std::system_error::~system_error()>) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
#9  0x0000000000561850 in std::__throw_system_error (__i=-1216757840) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/src/c++11/system_error.cc:337
#10 0x000000000058fbbc in std::thread::join (this=0x7fb0000d10) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:113
#11 0x000000000054b2a8 in std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void>::~_Async_state_impl() ()
#12 0x000000000054b420 in void __gnu_cxx::new_allocator<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void> >::destroy<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void> >(std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void>*) ()
#13 0x000000000054b384 in void std::allocator_traits<std::allocator<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void> > >::destroy<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void> >(std::allocator<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void> >&, std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void>*)
    ()
#14 0x000000000054b0f4 in std::_Sp_counted_ptr_inplace<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void>, std::allocator<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<FileCheckSummer::Jump(unsigned long)::{lambda()#1}> >, void> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
#15 0x0000000000422964 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#16 0x00000000004208e0 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() ()
#17 0x0000000000420060 in std::__shared_ptr<std::__future_base::_State_baseV2, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() ()
#18 0x0000000000422cf0 in std::__shared_ptr<std::__future_base::_State_baseV2, (__gnu_cxx::_Lock_policy)2>::reset() ()
#19 0x0000000000420e34 in std::__basic_future<void>::_Reset::~_Reset() ()
#20 0x00000000004201c4 in std::future<void>::get() ()
#21 0x0000000000548508 in FileCheckSummer::Jump(unsigned long) ()
#22 0x0000000000430904 in Par2Repairer::ScanDataFile(DiskFile*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Par2RepairerSourceFile*&, MatchType&, MD5Hash&, MD5Hash&, unsigned int&) ()
#23 0x000000000042fc08 in Par2Repairer::VerifyDataFile(DiskFile*, Par2RepairerSourceFile*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#24 0x0000000000433fec in Par2Repairer::VerifySourceFiles(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) [clone ._omp_fn.0] ()
#25 0x00000000005e5548 in gomp_thread_start (xdata=<optimized out>) at /home/ctng/.build/aarch64-unknown-linux-gnu/src/gcc/libgomp/team.c:120
#26 0x00000000005fd5fc in start_thread (arg=0x7fffffeaa6) at pthread_create.c:465
#27 0x000000000063d4fc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
animetosho commented 1 year ago

Thanks for the output.

Unfortunately I can't figure out a likely cause. The crashes you're getting seem to be in various different places.
For this particular case, it's crashing here, likely due to pthread_join failing. The error code thrown (-1216757840) doesn't match any of the documented errors (and none of them really make sense in the context).

The same error code was thrown in one of your previous examples, though there's been different codes as well. There's also the SIGILL case, which could be something entirely different.

I have a feeling that something's off with your libstdc++ runtime. Do you know what version the compiler and C++ runtime version is?
You could also try a static build, not that I have any clue whether it'd make a difference - you can do this via the LDFLAGS=-static environment during configure (e.g. rebuild via make clean; LDFLAGS=-static ./configure && make).

Safihre commented 1 year ago

It's already a static build. The build log is here, starting at 2023-07-20T20:11:39.0726708Z. https://pipelines.actions.githubusercontent.com/serviceHosts/ade5920e-3153-49ed-82db-40b0380642a9/_apis/pipelines/1/runs/586/signedlogcontent/15?urlExpires=2023-07-26T14%3A38%3A18.5805955Z&urlSigningMethod=HMACV1&urlSignature=sa8mhJSmL5nO9zZ27jrvqHXvipoL8r33q88Vyk3Gs3U%3D

I can totally imagine it being a libstdc++ problem, but I'm not sure how this works at Synology. Maybe if @hgy59 knows where to look or what it could be?

animetosho commented 1 year ago

Unfortunately that's a dead link.
Do you know what versions of GCC/libc are being used for the build?

Safihre commented 1 year ago

Will close because it's probably some weird Synology thing with their ancient GCC's.

Safihre commented 12 months ago

I found the solution, it was to make it not static!

animetosho commented 12 months ago

Interesting. Maybe it's some libc that doesn't like static linking...