Open barracuda156 opened 11 months ago
Notice this happens despite -DDISABLE_CPU_OPTIMIZATION=ON
being passed.
Yeah, this is the bug: https://github.com/drowe67/LPCNet/blob/00890a63c7c2b223f35b087efbab3b92c8829d54/CMakeLists.txt#L140-L146
This gonna break the build for any OS on PowerPC, MIPS etc.
Notice this happens despite
-DDISABLE_CPU_OPTIMIZATION=ON
being passed.
This is intentional to support the cross-compiling workflow for the FreeDV application. From the top-level CMakeLists.txt
file:
# Detection of available CPU optimizations
if(NOT DISABLE_CPU_OPTIMIZATION)
...
else()
# Presume all optimizations are available as the user is likely setting them themselves
# (e.g. cross-compiling)
set(AVX2_PRESENT TRUE)
set(AVX_PRESENT TRUE)
set(SSE_PRESENT TRUE)
set(NEON_PRESENT TRUE)
endif()
You should be able to explicitly specify -DAVX2=OFF -DAVX=OFF -DSSE=OFF -DNEON=OFF
at the command line to prevent usage of the Intel/ARM flags.
BTW I'm not sure if there are any PPC machines that could satisfactorily run this version of LPCNet or FreeDV 2020 modes in general. (I only have access to Intel and ARM based Macs here for development.) Definitely let us know if your testing does prove that they can.
@tmiw Thank you for responding.
Since existing code breaks the build for several archs, I believe it is wrong, regardless of the fact that it facilitates something else. It can be written in a way cross-compiling works, and PPC (or RISC-V, MIPS etc.) are not broken. Requiring the user to manually set flags (which is also undocumented, apparently) is a suboptimal solution. Especially given that people will expect -DDISABLE_CPU_OPTIMIZATION=ON
to turn off optimizations – while in fact it does not.
I'm not sure if there are any PPC machines that could satisfactorily run this version of LPCNet or FreeDV 2020 modes in general
So far I only confirmed it builds fine (after CMakeLists are fixed), but I can do testing, if there are proper tests. Macports turns off testing for this port even on Intel, with a note that tests are Linux-only (I have no idea if that is accurate). Regardless of the PPC builds on macOS, the same issue will be relevant for FreeBSD and Linux, which actively support PPC in current systems – there is, presumably, no reason why something should not work there, though I cannot test it personally.
P. S. If it will be determined for sure that PPC (or any other arch for that matter) cannot work in principle, it should be a) documented and b) the build should err out at configure with a clear message why it fails. Until proven otherwise, however, I would assume it either works (at least on Linux PPC) or can work with some fixes.
@tmiw UPD. Okay, Macports is wrong, tests run. This is on PowerPC: expectedly, SIMD fail, since there is no Altivec implementation, other tests pass:
---> Testing lpcnetfreedv
Executing: cd "/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_audio_lpcnetfreedv/lpcnetfreedv/work/build" && /usr/bin/make test
Running tests...
/opt/local/bin/ctest --force-new-ctest-process
Test project /opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_audio_lpcnetfreedv/lpcnetfreedv/work/build
Start 1: feature_extraction
1/4 Test #1: feature_extraction ............... Passed 0.23 sec
Start 2: nnet2f32
2/4 Test #2: nnet2f32 ......................... Passed 0.62 sec
Start 3: SIMD_functions
3/4 Test #3: SIMD_functions ...................***Failed 0.01 sec
Start 4: lpcnet_enc_dec
4/4 Test #4: lpcnet_enc_dec ................... Passed 0.11 sec
75% tests passed, 1 tests failed out of 4
Total Test time (real) = 0.99 sec
The following tests FAILED:
3 - SIMD_functions (Failed)
@tmiw I looked into the source of test_vec
, it apparently compares results from non-optimized and vectorized (fast) versions of sgemv_accum16
and sparse_sgemv_accum16
. Obviously, there are no vectorized ones for PPC at the moment. The test_vec
binary can be linked once fast functions are removed, but then there is nothing to compare to, and test fails, unsurprisingly.
If you suggest another meaningful way to run that test with non-optimized functions only, I can try that.
At the moment, this is the only test which fails.
For the reference, test log: lpcnetfreedv_tests.log (Notice that SIMD test output is not meaningful here – I merely removed fast functions to fix linking, so comparisons do not work. Other tests pass.)
Hi @barracuda156 - using your target machine, can you pls post the output of
~/LPCNet/build_linux/src $ time sox ../../wav/all.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s > /dev/null
<snip>
real 0m31.281s
So 31 seconds for a 49 second wave file on my machine. So the key question is will the code run in real time on PPC etc.
This is a experimental fork of LPCNet that we have been using to try out Neural speech coding with Ham radio. We're not pitching it at widespread use outside of the FreeDV community at this time. It may be deprecated in the next few years as our project and neural speech coding evolves.
What is your use case for this repo?
@drowe67 Thank you for responding! Here is the output:
36-25% time /opt/local/bin/sox /opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_audio_lpcnetfreedv/lpcnetfreedv/work/LPCNet-0.5/wav/all.wav -t raw -r 16000 - | /opt/local/bin/lpcnet_enc -s | /opt/local/bin/lpcnet_dec -s > /dev/null
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
direct split VQ
dec: 3 pred: 0.00 num_stages: 4 mbest: 5 bits_per_frame: 52 frame: 30 ms bit_rate: 1733.33 bits/s
64 1 1 16 128 1152 160 160 160 160
ftest cols = 2002
bits_written 86216
/opt/local/bin/sox -t raw -r 16000 - 0.02s user 0.02s system 0% cpu 2:58.76 total
/opt/local/bin/lpcnet_enc -s 5.71s user 0.06s system 3% cpu 2:58.83 total
/opt/local/bin/lpcnet_dec -s > /dev/null 224.28s user 1.69s system 98% cpu 3:49.03 total
Not fast, I understand :) But we use non-vectorized code, and this is 2005 machine.
Re use case: we got two ports which depend on LPCNet: https://ports.macports.org/port/lpcnetfreedv/details In turn, quite a number of ports depend on those two.
So unfortunately the code in this repo can't run in real time so will be of no use to your end users - FreeDV decodes off air signal in real time. Can I suggest you build codec2 without the LPCNet option (see codec2/README.md)?
@tmiw - does freedv-gui still build without LPCNet?
@tmiw - does freedv-gui still build without LPCNet?
IIRC it required LPCNet a while back. I haven't tried recently, though. Regardless, there is logic in the freedv-gui code to prevent use of 2020/2020B if AVX isn't available (x86 only) or if the 2020 decode is slower than twice the speed of real time (i.e. it needs to take ~0.5s* or less to decode a 1s audio sample).
* I don't think it's exactly 0.5s anymore; I remember adding a buffer a while back per user feedback but I don't remember what it got set to.
Since existing code breaks the build for several archs, I believe it is wrong, regardless of the fact that it facilitates something else. It can be written in a way cross-compiling works, and PPC (or RISC-V, MIPS etc.) are not broken. Requiring the user to manually set flags (which is also undocumented, apparently) is a suboptimal solution. Especially given that people will expect
-DDISABLE_CPU_OPTIMIZATION=ON
to turn off optimizations – while in fact it does not.
I think the lack of documentation for this is a fair point. This could perhaps be renamed to DISABLE_CPU_OPTIMIZATION_FLAG_DETECTION
or something instead? (suggestions appreciated)
I do have a few concerns about PR #57 that was opened for this issue:
-DBUILD_OSX_UNIVERSAL
behavior is being changed so that we either build for PPC/PPC64 only or x86/ARM only. If we're already going to support PPC, we should ask whether this should be broadened to also support the machines that were built around the time of the PPC->Intel transition.AVX
, etc. CMake flags are being changed to OFF
if -DDISABLE_CPU_OPTIMIZATION=ON
. I'd prefer that they remain ON
if we're building on an x86 or ARM environment (or cross-compiling for one). However, this is mainly because freedv-gui has broken for some users before due to improper use of certain compiler flags (e.g. for AVX2), so I'm very wary of making changes unless they've been tested.That said, I took a look at how LPCNet is being built for Windows and macOS and cmake/BuildLPCNet.cmake
doesn't seem to specify compiler flags:
if(CMAKE_CROSSCOMPILING)
set(LPCNET_CMAKE_ARGS ${LPCNET_CMAKE_ARGS} -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE})
endif()
if(BUILD_OSX_UNIVERSAL)
set(LPCNET_CMAKE_ARGS ${LPCNET_CMAKE_ARGS} -DBUILD_OSX_UNIVERSAL=1)
endif(BUILD_OSX_UNIVERSAL)
include(ExternalProject)
ExternalProject_Add(build_lpcnetfreedv
SOURCE_DIR LPCNet_src
BINARY_DIR LPCNet_build
GIT_REPOSITORY https://github.com/drowe67/LPCNet.git
GIT_TAG v0.5
CMAKE_ARGS ${LPCNET_CMAKE_ARGS}
CMAKE_CACHE_ARGS -DCMAKE_OSX_DEPLOYMENT_TARGET:STRING=${CMAKE_OSX_DEPLOYMENT_TARGET}
INSTALL_COMMAND ""
)
So I could be worrying too much about the DISABLE_CPU_OPTIMIZATION
change too. (Note: Since the last LPCNet PR that touched this, freedv-gui switched to LLVM MinGW for Windows and a lot of cleanup was done to the freedv-gui CMake scripts in general to reduce the need for the various build_*.sh
scripts.)
@tmiw - does freedv-gui still build without LPCNet?
IIRC it required LPCNet a while back. I haven't tried recently, though. Regardless, there is logic in the freedv-gui code to prevent use of 2020/2020B if AVX isn't available (x86 only) or if the 2020 decode is slower than twice the speed of real time (i.e. it needs to take ~0.5s* or less to decode a 1s audio sample).
Actually I feel this is the key to this Issue. Rather than relying on a run time test, and committing our team to support and maintenance on platforms where it can't possibly work - I'd prefer to not build LPCNet for platforms where it can't possibly run, and have the higher layer build systems deal with that at build time (like codec2 does).
@drowe67 I will address other points when I am not on a mobile app, but couple of things:
It should be clear that supporting platforms other than Intel/ARM is not the same as supporting PPC on macOS. I believe, existing code breaks the build on Linux and BSD too, and certainly there are modern Power systems on par with Intel/ARM speed-wise. That is, an arguably fair point that current non-vectorized implementation may be of purely academic use on MacOS PPC does not prevent from supporting PowerPC as such. You do not ban i386, arm32 and some specific archaic x86 CPUs which may also be too slow for real-time decoding, after all. It makes a better sense to disable PPC by default (and that is done anyway) and not prohibit a user to try the code which actually does work (which is tested now). Someone may come up with AltiVec/VSX backend – and it is more likely to happen if the port works to begin with.
I do not think this gonna add any maintenance burden for you, given there is no vectorized implementation. In fact, there is nothing in the code to require PPC-specific changes, it builds and works as is. We only need to fix CMake build system, which at the moment sabotages otherwise working config. Fix it once, in a way is acceptable for you, and leave it then, nothing for PPC gonna change on MacOS, and unless AltiVec/VSX are added, nothing specific for PowerPC needed generally.
Re deployment target: 10.9 was there already, not my choice or change; for PPC on MacOS it makes no sense to set it above 10.6 though, and ppc64 needs 10.5. Since there is nothing here, AFAICS, to require 10.6 over 10.5, just use the latter. This change affects only macOS PPC systems, so is irrelevant for anything else.
Actually I feel this is the key to this Issue. Rather than relying on a run time test, and committing our team to support and maintenance on platforms where it can't possibly work - I'd prefer to not build LPCNet for platforms where it can't possibly run, and have the higher layer build systems deal with that at build time (like codec2 does).
FWIW, I was able to get freedv-gui to compile without LPCNet, but it definitely needed code and CMakeLists.txt changes to do so. The good news is that it needed fewer changes than I thought it would, at least for the proof of concept (a neater solution would of course need more work). One issue is that FREEDV_MODE_2020
still seems to be defined by Codec2 even if compiled without LPCNet, which caused freedv-gui to run its first-time 2020 benchmark test (and hung trying to do so).
Speaking of the runtime test, I still think one is necessary even if we conditionally compile LPCNet. For example, there are ARM systems that have NEON but aren't necessarily powerful enough to satisfactorily decode 2020 (e.g. Raspberry Pi).
For example, there are ARM systems that have NEON but aren't necessarily powerful enough to satisfactorily decode 2020
I would assume some CoreDuo would not work either, since they are presumably slower that the last G5s. Or if they do, then G5 may also work, once AltiVec backend is added.
@tmiw Regarding universal builds on macOS. The problem here is that GNU GCC does not support universal builds at all – at least until upstream fixes its driver-driver
which existed with Apple GCC (and Apple GCC will not compile LPCNet
, being too outdated). (It is in plans, kind of, but Iain has no time, and no one else is capable of doing that.)
This problem is arch-agnostic, AFAIK: you will not be able to build i386+x86_64 with GCC too.
Now, Macports has its own implementation for universal builds which does work with modern GCC, however it has issues with combining Intel with PowerPC. Long story short, current ports for GCC do not build as genuinely universal and won’t support x86_86+ppc(64) targets. They do support i386+x86_64 and ppc+ppc64 though.
Clang is broken for PowerPC, so if anyone is building on macOS PowerPC, they will be using GCC.
Having all this in mind, I think it is a sane choice to make it either Intel+ARM (which would require Clang) or ppc+ppc64 (gonna work with Macports environment only – so do not force-enable universal in this case).
@barracuda156, thanks for that feedback. We'll need to discuss further as a PLT how this will be resolved (if at all). Once we have any further updates, we'll update this issue accordingly.
@tmiw Thank you!
P. S. Just to be clear, universal builds issue is of very low priority, and I am fine with leaving it as is (i.e. keeping Intel+ARM only). This we can fix in Macports locally, and that kind of patch will be easy to carry, be it needed. I am also not particularly concerned about deployment target setting, as Macports overrides it with correct setting. (However it makes no sense to set it to 10.9 for PowerPC, since any SDK after 10.6 lacks PPC slices.)
I reimplemented the previously proposed PR in what should hopefully be a lower impact manner: https://github.com/drowe67/LPCNet/pull/59
@tmiw Thank you very much!