Closed athoik closed 6 years ago
It does not compile on a Stretch debian distro on an RPI2
2018-02-24 13:40 GMT+01:00 Athanasios Oikonomou notifications@github.com:
It seems that SSE2NEON can sucessfully convert all SSE2 instructions to NEON.
This commit improves viterbi decoding speed more than 25%.
It was tested using verify_viterbi (infastructure files from spiral.net).
Using scalar C the decoder speed was 2719.79 kbits/s. Using SSE2NEON with SSE 4-way the decoder speed is 3447.81 kbits/s.
In order to use it we need to include spiral-neon.h to CMakeLists.txt Add the following definitions again to CMakeLists.txt
if(DEFINED NEON_AVAILABLE) add_definitions(-DNEON_AVAILABLE) endif ()
And finally compile it using -DNEON_AVAILABLE flag.
You can view, comment on, or merge this pull request online at:
https://github.com/JvanKatwijk/dab-cmdline/pull/41 Commit Summary
- viterbi: Add NEON SIMD using SSE2NEON
File Changes
- A library/src/backend/viterbi_768/SSE2NEON.h https://github.com/JvanKatwijk/dab-cmdline/pull/41/files#diff-0 (1688)
- A library/src/backend/viterbi_768/spiral-neon.c https://github.com/JvanKatwijk/dab-cmdline/pull/41/files#diff-1 (701)
- A library/src/backend/viterbi_768/spiral-neon.h https://github.com/JvanKatwijk/dab-cmdline/pull/41/files#diff-2 (36)
- M library/src/backend/viterbi_768/spiral-no-sse.c https://github.com/JvanKatwijk/dab-cmdline/pull/41/files#diff-3 (2)
- M library/src/backend/viterbi_768/viterbi-768.cpp https://github.com/JvanKatwijk/dab-cmdline/pull/41/files#diff-4 (16)
Patch Links:
- https://github.com/JvanKatwijk/dab-cmdline/pull/41.patch
- https://github.com/JvanKatwijk/dab-cmdline/pull/41.diff
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/pull/41, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwGNxLIBmxDqwkyPrBVv6NiN5VME7ks5tYAMxgaJpZM4SR4M3 .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
What parameters are you using?
I am using open embedded (cross compile to arm) using the following parameters to GCC:
arm-oe-linux-gnueabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard --sysroot=...
What error do you get?
an issue with a macro expansion in SSE2NEON.h, some apparent inconsistency with a macro definition in arm-neon.h
2018-02-24 22:19 GMT+01:00 Athanasios Oikonomou notifications@github.com:
What parameters are you using?
I am using open embedded (cross compile to arm) using the following parameters to GCC:
arm-oe-linux-gnueabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard --sysroot=...
What error do you get?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/pull/41#issuecomment-368261125, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwKCMIPwBIJ4ktzLygXxCMbaew1sXks5tYHz9gaJpZM4SR4M3 .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
n file included from SSE2NEON.h:123:0,
from spiral-neon.c:27:
SSE2NEON.h: In function ‘_mm_setzero_si128’:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:5792:1: error:
inlining failed in call to always_inline ‘vdupq_n_s32’: target specific
option mismatch
vdupq_n_s32 (int32_t __a)
^~~
In file included from spiral-neon.c:27:0:
SSE2NEON.h:312:33: note: called from here
SSE2NEON.h:230:2:
(x)
SSE2NEON.h:312:33:
return vreinterpretq_m128i_s32(vdupq_n_s32(0));
SSE2NEON.h:230:3: note: in definition of macro ‘vreinterpretq_m128i_s32’
(x)
is the error. I really do not have a clue how to handle it
2018-02-24 22:26 GMT+01:00 jan van katwijk <j.vankatwijk@gmail.com>:
> an issue with a macro expansion in SSE2NEON.h, some apparent inconsistency
> with a macro definition in arm-neon.h
>
>
> 2018-02-24 22:19 GMT+01:00 Athanasios Oikonomou <notifications@github.com>
> :
>
>> What parameters are you using?
>>
>> I am using open embedded (cross compile to arm) using the following
>> parameters to GCC:
>>
>> arm-oe-linux-gnueabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard
>> --sysroot=...
>>
>> What error do you get?
>>
>> —
>> You are receiving this because you modified the open/close state.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/JvanKatwijk/dab-cmdline/pull/41#issuecomment-368261125>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AITzwKCMIPwBIJ4ktzLygXxCMbaew1sXks5tYHz9gaJpZM4SR4M3>
>> .
>>
>
>
>
> --
> Jan van Katwijk
>
>
> +31 (0)15 3698980 <+31%2015%20369%208980>
> +31 (0) 628260355 <+31%206%2028260355>
>
--
Jan van Katwijk
+31 (0)15 3698980
+31 (0) 628260355
Hi,
Did you enable the NEON GCC flags?
https://community.arm.com/tools/b/blog/posts/arm-cortex-a-processors-and-gcc-command-lines
Oops, that was the trick
2018-02-26 18:41 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
Did you enable the NEON GCC flags?
https://community.arm.com/tools/b/blog/posts/arm-cortex- a-processors-and-gcc-command-lines
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/pull/41#issuecomment-368585112, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwFG6D3Lybo2Vl0N1ciVvJtEFrWaNks5tYuy2gaJpZM4SR4M3 .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
It seems that SSE2NEON can sucessfully convert all SSE2 instructions to NEON.
This commit improves viterbi decoding speed more than 25%.
It was tested using verify_viterbi (infastructure files from spiral.net).
Using scalar C the decoder speed was 2719.79 kbits/s. Using SSE2NEON with SSE 4-way the decoder speed is 3447.81 kbits/s.
In order to use it we need to include spiral-neon.h to CMakeLists.txt Add the following definitions again to CMakeLists.txt
if(DEFINED NEON_AVAILABLE) add_definitions(-DNEON_AVAILABLE) endif ()
And finally compile it using -DNEON_AVAILABLE flag.