drowe67 / LPCNet

Experimental Neural Net speech coding for FreeDV
BSD 3-Clause "New" or "Revised" License
68 stars 24 forks source link

Integrate latest vec_*.h from xiph/LPCNet #51

Closed tmiw closed 1 year ago

tmiw commented 1 year ago

This PR integrates the latest vec_*.h files from the upstream repo. As a result, we end up with the following performance improvements:

macOS x86_64 (2019 MacBook Pro):

master:

HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m1.395s
user    0m1.530s
sys 0m0.042s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m1.435s
user    0m1.477s
sys 0m0.031s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m1.408s
user    0m1.543s
sys 0m0.041s
HT-TM05:src mooneer$

=> 1.412s average

This PR:

HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...

real    0m1.256s
user    0m1.397s
sys 0m0.032s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m1.305s
user    0m1.413s
sys 0m0.040s
HT-TM05:src mooneer$ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m1.205s
user    0m1.353s
sys 0m0.028s
HT-TM05:src mooneer$

=> 1.255s average (~10% speedup)

macOS aarch64 (M1 Mac Mini)

master:

mooneer@ubuntu-server src % time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
( sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s)  0.93s user 0.03s system 110% cpu 0.869 total
mooneer@ubuntu-server src % time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
( sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s)  0.94s user 0.03s system 110% cpu 0.871 total
mooneer@ubuntu-server src % time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
( sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s)  0.94s user 0.03s system 110% cpu 0.872 total

=> 0.871s average

This PR:

mooneer@ubuntu-server src % time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
( sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s)  0.69s user 0.03s system 115% cpu 0.615 total
mooneer@ubuntu-server src % time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
( sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s)  0.67s user 0.02s system 115% cpu 0.607 total
mooneer@ubuntu-server src % time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
( sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s)  0.68s user 0.03s system 115% cpu 0.611 total

=> 0.611s average (~30% speedup)

Raspberry Pi 4 (aarch64)

master:

mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m6.648s
user    0m6.916s
sys 0m0.093s
mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m6.663s
user    0m6.987s
sys 0m0.092s
mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m6.590s
user    0m6.857s
sys 0m0.082s
mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ 

=> 6.634s average

This PR:

mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m4.420s
user    0m4.672s
sys 0m0.098s
mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m4.364s
user    0m4.625s
sys 0m0.090s
mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m4.431s
user    0m4.752s
sys 0m0.078s
mooneer@raspberrypi:~/freedv-gui/LPCNet/build/src $ 

=> 4.405s average (~30% speedup)

tmiw commented 1 year ago

Apparently with this PR LPCNet is basically real time on the Pi 4B now. lpcnet_enc seems to only take 0.3-0.4s and wia.wav is 4.2s long (per Audacity).

Of course, it's just real time, so it still might not perform well enough without bringing in additional changes from upstream.

drowe67 commented 1 year ago

Thanks @tmiw - did you listen to a few sample files to confirm it is working OK?

tmiw commented 1 year ago

Thanks @tmiw - did you listen to a few sample files to confirm it is working OK?

Seems to work okay with wia.wav and all.wav but I don't detect much if any difference in the resulting audio (which is probably a good thing).

However, I did notice a compiler error on the Raspberry Pi 3 but that may be because I'm running an older OS on it. Will need to confirm with a newer SD card.

tmiw commented 1 year ago

Turns out that the compiler errors were due to the new vec_* files containing some aarch64 specific code. After removing that, the Pi 3B+ gives me the following results for master:

master:

pi@piaware:~/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m50.840s
user    0m53.434s
sys 0m0.135s
pi@piaware:~/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m51.135s
user    0m53.744s
sys 0m0.144s
pi@piaware:~/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m51.307s
user    0m53.776s
sys 0m0.250s
pi@piaware:~/LPCNet/build/src $ 

=> 51.094s average

vs. this PR:

pi@piaware:~/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m35.547s
user    0m38.187s
sys 0m0.204s
pi@piaware:~/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m35.562s
user    0m38.273s
sys 0m0.220s
pi@piaware:~/LPCNet/build/src $ time (sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null)
...
real    0m37.578s
user    0m39.662s
sys 0m0.560s
pi@piaware:~/LPCNet/build/src $

=> 36.229s average (30% improvement)