yEnc SSE decode ideas - Githubissues

animetosho / node-yencode

SIMD accelerated yEnc encoder/decoder and CRC32 calculator for node.js

37 stars 5 forks source link

yEnc SSE decode ideas #4

Closed Safihre closed 7 years ago

Safihre commented 7 years ago

I was amazed to find this repo, I have been thinking of some way to do yEnc-decoding (as a Python-C-extension) using SSE instructions but my knowledge of C is just too rudimentary for now.

Do you think think SSE can help compared to regular char-by-char decoding of yEnc body? How would you go about the decoding-escaping problem? I can imagine finding the escape chars, but how to remove them later on when building the output string? I tried to grasp your encoding-code, but I think I probably miss the main idea due to the included edge-cases and optimizations.

Thanks!

EDIT: I think I am getting more and more of the code and how you handle the encoding-escaping here: https://github.com/animetosho/node-yencode/blob/master/yencode.cc#L718-L752 I don't completly understand the shuffle operations just yet and how they handle the extra chars, what are shufMixLUT and shufLUT?

hugbug commented 7 years ago

Thanks so much! I'll integrate the new version and report back.

In the meantime I've done more tests, in particular on Dell 2015 notebook when running Linux. The numbers are crazy high (MB/s):

Improvement	MacBook macOS i5-520	Dell Windows i7‑5600U	Dell Linux i7‑5600U	PVR Linux ARMv7	NEO2 Linux ARMv8
improved decoder, scalar crc	305	389	480	89	102
raw decoder, scalar crc	369	414	636	93	107
simd decoder, scalar crc	467	493	836	99	121
simd decoder, simd crc	520	541	1011	n/a	136

For description of devices, test conditions and more results (not related to SIMD) please see original post.

hugbug commented 7 years ago

Results for one-pass simd decoder with end-of-stream detection (simd-end):

Improvement	MacBook macOS i5-520	Dell Windows i7‑5600U	Dell Linux i7‑5600U	PVR Linux ARMv7	NEO2 Linux ARMv8
simd decoder	520	541	1011	99	136
simd-end decoder	520	570	1140	106	157

scalar crc for ARMv7, simd crc for all other devices.

Safihre commented 7 years ago

Cool to see how within 1 month from creating this issue it has now a working implementation in NZBget. So I would say this issue served its purpose and in case I have specific implementation questions for SABnzbd I will open another topic! Thanks all!

animetosho commented 7 years ago

It has been interesting - thanks for creating the topic!

Are you planning to migrate to Python 3 before using this decoder? I imagine that SABYenc could be changed to use it, as is, but I'd imagine that Python 3's API would be different - if that's the goal.