SynoCommunity / spksrc

Cross compilation framework to create native packages for the Synology's NAS
https://synocommunity.com
Other
3.03k stars 1.23k forks source link

Synology unrar performance on atom - official binary vs spksrc compiled binary #847

Closed Hubfront closed 10 years ago

Hubfront commented 10 years ago

Maybe somebody with synology atom nas might find this helpful. Thanks to the kind help of Diaoul and Dr-Bean i managed to compile spksrc unrar on debian 7 32bit for x686 evansport platform to compare the performance of unrar compiled with spksrc against the linux commandline version from http://www.rarlab.com/download.htm

In my test i found the roshall build more than 2,5 times faster than the spksrc build on a password encrypted rarset with compression level store. I now use the roshall linux build for nzbget and sabznd on my nas.

Versions were 5.01 for both builds on a DS214play, DSM 4.3. Both versions used only 1 core, which is in my case actually 1 thread (2 physical cores with Hyper-Threading), accordingly the cpu-usage of unrar was at 25%. WIth unrar 5.0 it is theoretically possible for unrar to use more than one core, although i did never see it happen on my nas. ;-) Owners of a Synology x64-nas might find this post helpful http://www.synology-forum.de/showthread.html?50070-NZBGET-Entpacken-schneller-machen

I unrared a 10gb file from 71 rarfiles, each file 150MB size. To finish 47 rafiles, the unrar_roshall needed 27min 16sec. unrar_spksrc needed 72min 55sec.

Update:

For completeness i tested also the version of unrar, which comes with Synology DSM 4.3, which is version 3.80 and additionally a version of unrar which was included in my sabnzbd installation from SynoCommunity / spksrc, which is version 5.0. The rarfiles are exact the same as above and the nas hardware as well.

To finish the 47 files the unrar_380_syn needed 30min 45sec, the unrar_5.0_sab_package needed 75min 41sec.

It looks like the Synology's own version of unrar has been optimized nicely for the hardware, but the roshals binary did run slightly faster in my test. The performance of the unrar which was included in the sabnzbd package (0.7.16-7) run similar to the binary i did compile on myself.

After setting ADDITIONAL_CXXFLAGS = -O2 or -O3 respectively in Makefile (many thanks to Dr-Bean for this fix) :) :

This time running on DSM 5.0 final: To finish the 47 files the unrar_5.01_spksrc_o3 needed 22min 05sec, unrar_5.01_spksrc_o2 needed 23min 29sec (recompiled with CXXFLAGS = -O2 flag), the unrar_5.01_syn_DSM5.0 needed 30min 02sec (comes with synology DSM 5.0), the unrar_roshall needed 27min 36sec (unrar_roshall (see above) tested again, in order to check on possible systematic influence of DSM 5.0 on unrar performance).

The re-compiled unrar_5.01_spksrc_o3 did perform best of all tested unrar builds, unpacking around 6 percent faster than unrar_5.01_spksrc_o2 and 24 percent faster than the unrar_roshall on above rarset. The measured time difference of unrar_roshall in my tests on DSM version 4.3 and 5.0 respectively was small (around 1 percent), so this could be explained maybe better by random cause than to assume a systematic influence of the DSM version.

Update with unrar v5.1beta1 (28.03.2014):

I just downloaded and compiled the newest version of unrar v5.1beta1 with spksrc for my DS214play, after updating the digests file for the new hashes and the Makefile in cross/unrar/.

For the 47 files the unrar_5.1beta1_spksrc_o3 only needed 7min 30sec. Thats an speed improvement of 294 percent over the former number one unrar_5.01_spksrc_o3 (unrar version 5.0.14), which needed 22min 04sec.

To sort out any possible systematic influences, e.g. the new DSM Update DSM5update1 and possible changes to spksrc i recompiled unrar_5.01_spksrc_o3 and found no significant changes in speed: it needed 22min 10sec, which is a negligible difference and confirmes, that the speed increase of the new unrar beta 5.1beta1 is caused solely from the improvements in its code.

Ranking for DS214play (evansport x686, 2 cores, 4 threads), unpacking 47 parts:

  1. 7min 30sec unrar_5.1beta1_spksrc_o3 (compiled with CXXFLAGS = -O3)
  2. 7min 30sec unrar_5.1.2_spksrc_o3 (compiled with CXXFLAGS = -O3) (Update 13.04.2014)
  3. 22min 05sec unrar_5.01_spksrc_o3 (compiled with CXXFLAGS = -O3)
  4. 23min 29sec unrar_5.01_spksrc_o2 (compiled with ADDITIONAL_CXXFLAGS = -O2)
  5. 27min 36sec unrar_roshall (linux unrar commandline version 5.01 from rarlab.com)
  6. 30min 02sec unrar_5.01_syn_DSM5.0 (unrar that comes with synology DSM 5.0 final)
  7. 30min 45sec unrar_380_syn (unrar version 3.80 that comes with synology DSM 4.3 final)
  8. 72min 55sec unrar_spksrc (unrar version 5.01 compiled with spksrc on Debian 7)

Ranking for DS214 (armadaxp, 2 cores): (Update 14.04.2014) unrar 5.0.1 (O3 flag, spksrc): 18min 44sec for 47 parts, 27min 53sec for all 71parts unrar 5.1.2 (O3 flag, spksrc): 19min 52sec for 47 parts, 29min 35sec for all 71parts unrar 5.0.1 (O2 flag, spksrc): 22min 59sec for 47 parts, 34min 11sec for all 71parts unrar 5.0.1 (original DSM5): 23min 16sec for 47 parts, 34min 42sec for all 71parts unrar 5.1.2 (O2 flag, spksrc): 23min 48sec for 47 parts, 35min 49sec for all 71parts

overrunner commented 10 years ago

On my DS413 with PowerPC / qoriq CPU i have a similar issue. The binary which the sabnzbd brings with is 4 times slower than the one from the rarlabs page.

Jarosch commented 10 years ago

Hey Hubfront,

take a look here: https://github.com/SynoCommunity/spksrc/issues/809 I had already opened the same issue some weeks ago and it got closed, because it's not a spksrc specific problem.

Thanks for your advice. And YES, I can confirm too that the unrar x64 binary from roshall is faster than the official Syno and SABnzbd binary which is delivered by the OS and the package.

Nevertheless I can also confirm that the Thecus N5550 is doing the extraction more faster than my DS1513+ with the roshall binary with same RAID configuration and HDD setup.

The problem is that I don't have the Thecus anymore, because I brought it back to the reseller.

I also contacted the Synology Support because of this issue and I got this answer for my problem:

BEGIN:

Hi Jaroslaw,

Thank you for the reply. Here is the feedback from our senior engineer:

rar performance very depends on the content of the archive. Since user was not comparing all machines with the same archive file (5GB on DS and 12GB on Thecus), we cannot tell if this is normal.

Please first have the user test with the 12GB archive file on DS, or at least test machines all with the same archive file, and see how it performs. So we can compare apples to apples.

END:

Not very helpful. In my opinion they should try the extraction on a standard notebook with one single hdd. Even there the unrar process is 3-4 times faster than on the DS with a roshall unrar binary.

Strange thing.

Best regards, Jarosch

Dr-Bean commented 10 years ago

Most likely, rarlabs` unrar is optimized for that specific CPU type/architecture (x86). The unrar SynoCommunity provides, and likely, the version Synology provides, are not optimized for use with specific arches atm. It has to cater for the whole range of different Synology architectures.

I would expect (though unconfirmed) that with the correct compiler flags, you should be able to compile an unrar version that has similar performance as the rarlabs version. Give it a try :)

Hubfront commented 10 years ago

I updated the first post with two perfomance tests for completeness: unrar from Synology v3.80 and the unrar which was included in the current sabznbd package (unrar 5.0).

overrunner: good to hear from you, that rarlab also has a fast binary for powerpc Jarosch: It's interesting, that the Thecus was so much faster. For me, i am quite happy with the roshals binary. I was very disappointed before and already thought about retourning back the synology nas. Dr-Bean, you might be right, that the huge performance differences could be a matter of compiler optimizing. I dont know how synology compiles their version of unrar, but the performance was similar to the roshal build.

Jarosch commented 10 years ago

Dr-Bean,

thanks for this advice. I have already tried to compile unrar within the spksrc with optimized compiler flags. Well, it worked, I was able to compile and I replaced the binary. But I haven't registered any performance boosts using the new compiled binary.

Where I can look up the Intel Atom Cedarview optimized compiler flags? Perhaps I have used the wrong ones.

Regards, Jarosch

Dr-Bean commented 10 years ago

Take a look at env.patch. CXXFLAGS=-O2 has been commented out, presumably because one or more arches (e.g. arm or ppc, not sure) cannot use those.

Try and leave that part of the patch out and see if it compiles correctly/improves speed. Next step would be to change the Makefile to use that flag conditionally, based on $(ARCH).

overrunner commented 10 years ago

For my Synology DS413 i use now the binary which comes with the IPKG Package. It's Version 5 and a little bit faster than the 5.01 from the rarlab Page. The fastest one is the Synology stock installed, but it's 3.80. I would recommend everybody to try IPKG.

dneckel commented 10 years ago

Using the version from rarlabs website did not work for me but the IPKG unrar works like a charm. Before a 2GB password protected rar took about 15-20min. Now it is down to ~5min on my 710+. Thanks for the advice :+1:

moneytoo commented 10 years ago

IPKG recipe for unrar is here: http://svn.nslu2-linux.org/svnroot/optware/trunk/make/unrar.mk but I don't see anything obviously different.

Dr-Bean commented 10 years ago

I was looking for that, thanks ;) The only thing I see is this: UNRAR_CFLAGS=$(TARGET_CFLAGS) in the unrar.mk ultimately translates to TARGET_CUSTOM_FLAGS= -O2 -pipe in the toolchain.mk. -O2 is the flag that optimizes for speed...I don't think we set that flag, right?

Dr-Bean commented 10 years ago

So it turns out the flag is not set. Fix: Add this to the Makefile cross/unrar. I think I got all the correct arches, but I might have missed a couple. `ADDITIONAL_CXXFLAGS = ifeq ($(findstring $(ARCH),evansport cedarview bromolow x86),$(ARCH)) ADDITIONAL_CXXFLAGS = -O2 endif```

Results, with a few seconds more or less between runs:

STANDARD
Version: 3.80
Unpacking took:  0m 52.76s
NZBGET ORG
Version: 5.01
Unpacking took:  1m 17.32s
NZBGET PATCH REMOVED
Version: 5.01
Unpacking took:  0m 25.11s
NZBGET UPDATED MAKEFILE
Version: 5.01
Unpacking took:  0m 25.70s
IPK
Version: 5.00
Unpacking took:  0m 30.79s

The results are generated from a modified version of this, which comes from here

overrunner commented 10 years ago

With Synology's DSM 5.0 Final, there comes a fresh 5.01 with. I did not benchmark it yet on my DS413, but i am going to do so...

Diaoul commented 10 years ago

I like not to depend on Synology's binaries for our own packages. This can prove usefull sometimes and we ensure there's no issue with the binaries as we include everything. @Dr-Bean: Great work! Thank you for fixing this. It's strange because I thought toolchain flag did set -O2 by default...

cytec commented 10 years ago

@Dr-Bean @Diaoul can confirm that the -O2 flag speeds things up a lot, thanks

Dr-Bean commented 10 years ago

I'm compiling all arches to make sure the flag is supported on all of them...I only added x86 to the fix above to be on the safe side. If so, it's a real simple patch just to add ADDITIONAL_CXXFLAGS = -O2 to unrars makefile (or maybe even better, remove that part of env.patch)

If not, I'm wondering if there are downsides to add it to the toolchains makefile...apart from the (possible) need to override it in some cases.

Diaoul commented 10 years ago

I think I removed it from toolchains makefile because on some archs, too much optimization flag cause compilation errors. An other reason is that sometimes it is set upstream. Go ahead and remove it from env.patch, if it compiles for everyone it should work properly too. We can even try -O3 but I have no idea if that would improve performance. Can you give it a try?

Dr-Bean commented 10 years ago

Results including -O3. It's just a couple milliseconds lower than -O2 (consistently), but increases compilation time a bit:

Standard Synology DSM4.3
    Version: 3.80
    Unpacking took:  0m 54.24s
Original unrar of NZBGET
    Version: 5.01
    Unpacking took:  1m 20.08s
Unrar of NZBGET with env.patch set to -O2
    Version: 5.01
    Unpacking took:  0m 26.75s
Unrar of NZBGET with env.patch set to -O3
    Version: 5.01
    Unpacking took:  0m 26.38s
Unrar of IPK
    Version: 5.00
    Unpacking took:  0m 29.75s

If compiling with -O2 completes successfully, I'll just remove that part of env.patch. Should be good enough.

Dr-Bean commented 10 years ago

Pushed the fix, it works on all arches :) @Diaoul: squidguard, sabnzbd, nzbget and nzbget-testing are the packages using unrar. Those need an spk-rev and changelog update if you plan to release new packages. Let me know if you want that done.

Hubfront commented 10 years ago

Dr-Bean, great work indeed and many thanks for this fix! I re-compiled with "ADDITIONAL_CXXFLAGS = -O2" in the Makefile and now the unrar_5.01_spksrc_o2 performed 18 percent faster than the official linux build from rarlab.com in my test, which was the fastest in my tests before. I updated my first post for test results.

Dr-Bean commented 10 years ago

You might be interested in checking out -O3. If the percentages on my small rarset scale linearly, you could shave another minute off ;)

Hubfront commented 10 years ago

It's complicated for me. The compiler rejects the changes in the patch. ;-)

Dr-Bean commented 10 years ago

In env.patch, put a minus sign in front of CXXFLAGS=-O2, and add a second line with +CXXFLAGS=-O3, so the file looks like this:

<..>
+#CXX=g++
-CXXFLAGS=-O2
+CXXFLAGS=-O3
 LIBFLAGS=-fPIC
<..>

That should patch correctly

Hubfront commented 10 years ago

I should have asked earlier. ;-)

Hubfront commented 10 years ago

This the result: with unrar_5.01_spksrc_o3 my 47 files were unpacked in 22min 05sec, this is around 6 percent faster than unrar_5.01_spksrc_o2, which needed 23min 29sec and 24 percent faster than the unrar_roshall. Definitely good :)

Dr-Bean commented 10 years ago

Minute and a half quicker? We're breaking records here ;) I do need to recompile all other arches to make sure we don't run into issues with the optimizations on anything other than evansport and bromolow (which is what I tested on) I'm compiling something else atm, so I'll start it tomorrow with some luck. If it works, I'll update the patch again in the next couple days.

Hubfront commented 10 years ago

Thank you for your effort!

Dr-Bean commented 10 years ago

-O3 compiles just fine so far, only x86 arches are left to compile, and that shouldn't be a problem. I'll merge the change one of these days.

cytec commented 10 years ago

@Dr-Bean -O3 works nice on x86 for me :)

Dr-Bean commented 10 years ago

I merged the -O3 flag with b3cef85da60bd81a48cf441062754ced78da52fc.

As I said before, squidguard, sabnzbd, nzbget and nzbget-testing use unrar. I know Sabnzbd has just been updated to 0.7.17, so that's a good reason to update that package. I do think we need to sort out #827 before releasing a set new packages though.

Diaoul commented 10 years ago

There is no harm in pushing nzbget too while you're at it. With SABnzbd they'll both greatly benefit from the optimization.

Hubfront commented 10 years ago

Hi Dr-Bean, tx for the update. Should this thread marked as done and closed now?

Dr-Bean commented 10 years ago

Yep, while I'm at it, I'll close it ;) I'll look into updating some packages in the next couple of days or so.

Hubfront commented 10 years ago

I just downloaded and compiled the newest version of unrar v5.1beta1 with spksrc for my DS214play, after updating the digests file for the new hashes and the Makefile in cross/unrar/.

And found it to be blazing fast! For the 47 files the unrar_5.1beta1_spksrc_o3 only needed 7min 30sec! Thats an speed improvement of 294 percent over the former number one unrar_5.01_spksrc_o3 (unrar version 5.0.14), which needed 22min 04sec.

To sort out any possible systematic influences, e.g. the new DSM Update DSM5update1 and possible changes to spksrc i recompiled unrar_5.01_spksrc_o3 and found no significant changes in speed: it needed 22min 10sec, which is a negligible difference and confirmes, that the speed increase of the new unrar beta 5.1beta1 is caused solely from the improvements in its code. Btw. the new version also uses only one thread on my DS like the other binaries, so CPU usage of the process never exceeds 25%.

So stay tuned for the release of unrar 5.1! Im already using the beta ofc for daily use. :-)

overrunner commented 10 years ago

Just a thought...: Are the par2 binaries fine? or are they not optimized with O2 like the unrar-binary?

Jarosch commented 10 years ago

Good morning,

due to par2: I'm replacing the par2 binary in the SabNzbd package with this one: http://chuchusoft.com/par2_tbb/

The original par2 isn't able to handle multiple threads, so that's the reason why I use this one. It works a lot of faster than the original one.

Regards, Jarosch

Hubfront commented 10 years ago

Hi @Jarosch, thankyou for the par2 hint. I use that multicore par2 tbb now with sabnzbd and its a lot faster than before.

I just retested the rarset with 5.1.2 and it was exactly as fast unpacking as 5.1.1: 7min 30sec. As said before, thats about three times faster than with 5.01 on the same rarset.

Dr-Bean commented 10 years ago

Downloading the set now. I'll have the script run the whole set, for each version of unrar I tested before, and we see what happens.

Sneak preview, only part01.rar (might be affected by a number of other processes running):

Standard Synology DSM4.3
Unpacking took:  0m 42.30s
Unrar 5.0.14
Unpacking took:  1m 39.53s
Unrar 5.0.14 with CXXFLAGS -O2
Unpacking took:  0m 26.44s
Unrar 5.0.14 with CXXFLAGS -O3
Unpacking took:  0m 21.71s
IPK Version: 5.00
Unpacking took:  0m 25.11s
Unrar 5.1.2 with CXXFLAGS -O2
Unpacking took:  0m 8.46s
Unrar 5.1.2 with CXXFLAGS -O3
Unpacking took:  0m 8.60s

Tested a couple times, and on all of these runs, unrar 5.1.2 with -O2 performs a bit better than -O3, although we'll see what happens when the full set is extracted.

Hubfront commented 10 years ago

@Dr-Bean, interesting, which processor platform do you use?

I just tested unrar 5.1.2 with O2 flag and the result is exact the same as with O3 flag, even for the complete set: 7 min 30sec for the first 47 files. For the complete set of 71 parts, which is altogether about 10GB, unrar 5.1.2 (O3-flag) needed 11min 10sec, unrar 5.1.2 (O2-flag) needed also 11min 10sec.

Dr-Bean commented 10 years ago

All tests have been run on a DSM 4.3, Bromolow-based VM. Had to resize a volume halfway, so things are a bit delayed. I've skipped the Synology unrar, the original spksrc unrar, and the IPK package, can't be bothered with the slow stuff ;)

Seems you're right about O2 and O3 not making much of a difference. Here, O3 is usually faster, but the times vary too much for it to be a clear win: times vary between 5m17 and 5m35 for both.

Unrar 5.0.14 with CXXFLAGS -O2, v5.01
Unpacking took:  14m 30.04s
Unrar 5.0.14 with CXXFLAGS -O3, v5.01
Unpacking took:  12m 45.85s
Unrar 5.1.2 with CXXFLAGS -O2, v5.10
Unpacking took:  5m 34.35s
Unrar 5.1.2 with CXXFLAGS -O3, v5.10
Unpacking took:  5m 32.03s

All in all, the new version is quite an improvement, if you test with a proper set of files ;)

Hubfront commented 10 years ago

Yes, it is indeed. I tested unrar also on my DS214 yesterday, which is a 2-core armadaxp architecture. On this machine unrar 5.0.14 compiled with O3-flag surprisingly performed a little faster than unrar 5.1.2 with O3-flag, at least on this rarset. ;-)

Ranking for DS214 (armadaxp, 2 cores): unrar 5.0.1 (O3 flag, spksrc): 18min 44sec for 47 parts, 27min 53sec for all 71parts unrar 5.1.2 (O3 flag, spksrc): 19min 52sec for 47 parts, 29min 35sec for all 71parts unrar 5.0.1 (O2 flag, spksrc): 22min 59sec for 47 parts, 34min 11sec for all 71parts unrar 5.0.1 (original DSM5): 23min 16sec for 47 parts, 34min 42sec for all 71parts unrar 5.1.2 (O2 flag, spksrc): 23min 48sec for 47 parts, 35min 49sec for all 71parts

So it looks like the new version unrar 5.1.x is a quite a boost for atom, although we cant say about the other arm or powerpc architectures yet. And the versions compiled with O3-flag did perfom faster as the ones with O2 flag on atom and armadaxp.

Dr-Bean commented 10 years ago

Ran your set on 88f6281 arch, so there's another ARM arch. In order:

Unrar 5.1.2 with CXXFLAGS -O3, v5.10
Unpacking took:  31m 40.01s
Unrar 5.0.14 with CXXFLAGS -O3, v5.01
Unpacking took:  32m 12.95s
Unrar 5.0.14 with CXXFLAGS -O2, v5.01
Unpacking took:  36m 50.82s
Unrar 5.1.2 with CXXFLAGS -O2, v5.10
Unpacking took:  38m 53.34s

So it looks like O3 wins, regardless of arch. Ran the script twice, 5.1.2 beat 5.0.14 twice, so I'm calling that good too ;)

moneytoo commented 10 years ago

Sabnzbd is ready to be updated then :)

Dr-Bean commented 10 years ago

Yep, the code is all set, except for a Changelog entry on major unrar speedup ;) I can't upload packages myself though, waiting for Piwi to help out. If you have time for it, great :), otherwise, we'll wait.

Dr-Bean commented 10 years ago

SAB has been updated, and it even has a neat changelog entry for unrar. An updated NZBget package is not published yet (@moneytoo, I don't suppose you could take a look at that?)

Sidenote: If it's worth looking further into par2, let's open a new issue. It would be preferred to stick to one source, which we can compile for the various arches, not sure if that's possible though.

In the meantime, I think we've tested everything there is to test on unrar. Thanks to all who have brought it to our attention, and have been testing with the various settings and versions. So, closing this issue once again.

moneytoo commented 10 years ago

@Dr-Bean Published.

Dr-Bean commented 10 years ago

@moneytoo Great, thanks a lot :)

Hubfront commented 10 years ago

@Dr-Bean regarding par2 there is par2cmdline+tbb, but unfortunately, arm seems not to be supported with tbb 2.2. If somebody is aware of another par2 project running on linux and having multicore/multithreading-support, that would be cool :)