benchmarking between AVX/SSE4.1

jg1uaa commented 10 months ago

SSE support is renewed, I took benchmark.

method:

$ cd LPCNet/build_dir/src
$ cat ../../wav/all.wav | ./lpcnet_enc -s > test.out
$ time cat test.out | ./lpcnet_dec -s > /dev/null

results:	CPU	build	time(real)
Intel Core i7-7700	AVX	7.132s	*1
Intel Core i7-7700	SSE4.1	8.941s	*1
AMD A8-7600	AVX	15.146s	*1
AMD A8-7600	SSE4.1	16.453s	*1
Intel Core i3-13100	AVX	3.730s	*2
Intel Core i3-13100	SSE4.1	4.870s	*2
Intel Core i7-7700	AVX	N/A	*3
Intel Core i7-7700	SSE4.1	29.428s	*3
Intel Core i7-7700	SSE4.1	10.858s	*4

(1)Debian-12.1/x86_64, gcc-12.2.0 (2)Ubuntu-22.04.3/x86_64 LTS on WSL2, gcc-11.4.0 (3)Slackware-15.0/i686 on QEMU-7.2.4/KVM, gcc-11.2.0 (4)Slackware-15.0/i686 on QEMU-7.2.4/KVM, clang-13.0.0

QEMU on Slackware did not support AVX instruction.

conclusion: on x86_64, SSE4.1 build is slightly slower than AVX but we can ignore this disadvantage.

on i686, SSE4.1 build depends with compiler.

suggestion: we can use SSE4.1 as default on x86_64 environment. with clever compiler, we will be able to do same things for i686.

drowe67 commented 10 months ago

Thanks for your analysis @jg1uaa. Can you please tell me how you are using LPCNet? We have found that FreeDV 2020 is not very robust to HF channels, and is not used by many people.

So we are not actively developing LPCNet and FreeDV 2020 at this time.

tmiw commented 10 months ago

In freedv-gui, we test for AVX as well as timing the decode of random audio to ensure reliable decode (i.e. at least a bit faster than real time). One question I have is whether by disabling AVX and only compiling SEE we'd up the ability to use 2020 modes on any additional machines. In other words, outside of QEMU, are there machines that would be fast enough to decode 2020 with SSE alone that aren't capable of using AVX?

OTOH given @drowe67's comment above this question may be moot.

jg1uaa commented 10 months ago

@drowe67 Sorry, I don't use FreeDV(any modes) because of I do not have station license for that mode. In Japan, we have to write application form to use non-standard mode (for example, FT8, FreeDV, SSTV, FAX and so on) and need to get station license.

@tmiw all.wav has 49sec long. I think decoding time takes under 50% of original voice time might be stable, but no evidence. I have Pentium G4600 machine, this is 6th-Gen and no AVX support so SSE support is mandatory. But, current 12th-Gen based Celeron/Pentium has AVX, it is an idea to keep no SSE support as default.

tmiw commented 10 months ago

Considering that we're probably going to deprecate this repo soon, I think AVX can be kept mandatory. Thanks for the testing, however!

@drowe67, good to close?

drowe67 / LPCNet

benchmarking between AVX/SSE4.1 #62