Closed Safihre closed 7 years ago
Thanks so much! I'll integrate the new version and report back.
In the meantime I've done more tests, in particular on Dell 2015 notebook when running Linux. The numbers are crazy high (MB/s):
Improvement | MacBook macOS i5-520 |
Dell Windows i7‑5600U |
Dell Linux i7‑5600U |
PVR Linux ARMv7 |
NEO2 Linux ARMv8 |
---|---|---|---|---|---|
improved decoder, scalar crc | 305 | 389 | 480 | 89 | 102 |
raw decoder, scalar crc | 369 | 414 | 636 | 93 | 107 |
simd decoder, scalar crc | 467 | 493 | 836 | 99 | 121 |
simd decoder, simd crc | 520 | 541 | 1011 | n/a | 136 |
For description of devices, test conditions and more results (not related to SIMD) please see original post.
Results for one-pass simd decoder with end-of-stream detection (simd-end):
Improvement | MacBook macOS i5-520 |
Dell Windows i7‑5600U |
Dell Linux i7‑5600U |
PVR Linux ARMv7 |
NEO2 Linux ARMv8 |
---|---|---|---|---|---|
simd decoder | 520 | 541 | 1011 | 99 | 136 |
simd-end decoder | 520 | 570 | 1140 | 106 | 157 |
Cool to see how within 1 month from creating this issue it has now a working implementation in NZBget. So I would say this issue served its purpose and in case I have specific implementation questions for SABnzbd I will open another topic! Thanks all!
It has been interesting - thanks for creating the topic!
Are you planning to migrate to Python 3 before using this decoder? I imagine that SABYenc could be changed to use it, as is, but I'd imagine that Python 3's API would be different - if that's the goal.
I was amazed to find this repo, I have been thinking of some way to do yEnc-decoding (as a Python-C-extension) using SSE instructions but my knowledge of C is just too rudimentary for now.
Do you think think SSE can help compared to regular char-by-char decoding of yEnc body? How would you go about the decoding-escaping problem? I can imagine finding the escape chars, but how to remove them later on when building the output string? I tried to grasp your encoding-code, but I think I probably miss the main idea due to the included edge-cases and optimizations.
Thanks!
EDIT: I think I am getting more and more of the code and how you handle the encoding-escaping here: https://github.com/animetosho/node-yencode/blob/master/yencode.cc#L718-L752 I don't completly understand the shuffle operations just yet and how they handle the extra chars, what are
shufMixLUT
andshufLUT
?