dthuerck / mapmap_cpu

A high-performance general-purpose MRF MAP solver, heavily exploiting SIMD instructions.
BSD 3-Clause "New" or "Revised" License
102 stars 51 forks source link

SSE2 #20

Closed alke closed 4 years ago

alke commented 4 years ago

Maybe more of a question I get this error when compile opendronemap 0.7.0 SuperBuild/src/mvstexturing/elibs/mapmap/mapmap/source/vector_math.impl.h:1057:33: error: ‘_mm_extract_epi64’ was not declared in this scope a1.i = _mm_extract_epi64(aa, 0);

os system is Ubuntu 16.04 LTS 32-bit memmory 7,8 GiB processor Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz Grapic Intel® Sandybridge Mobile x86/MMX/SSE2

maybe mapmap not supports this configuration?

dthuerck commented 4 years ago

Hi,

_mm_extract_epi64 is a SSE 4.1 function; your CPU supports AVX, so you should be good there.

However, I'm a little worried about your OS being 32 bit. Could you please paste the output of cat /proc/cpuinfo and append your computer's smmintrin.h (usually to be found in /usr/lib/gcc/x86_64-linux-gnu/???/include depending on your compiler version).

Thanks!

alke commented 4 years ago

Hello here it comes

cat /proc/cpuinfo:

processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 885.156 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 1011.828 cache size : 6144 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 2 initial apicid : 2 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 825.945 cache size : 6144 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 867.796 cache size : 6144 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 896.757 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 1 initial apicid : 1 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 895.210 cache size : 6144 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 3 initial apicid : 3 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 991.117 cache size : 6144 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 5 initial apicid : 5 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 907.070 cache size : 6144 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:


And the file /usr/lib/gcc/i686-linux-gnu/5.4.0/include/smmintrin.h /* Copyright (C) 2007-2015 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Under Section 7 of GPL version 3, you are granted additional permissions described in the GCC Runtime Library Exception, version 3.1, as published by the Free Software Foundation.

You should have received a copy of the GNU General Public License and a copy of the GCC Runtime Library Exception along with this program; see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */

/ Implemented from the specification included in the Intel C++ Compiler User Guide and Reference, version 10.0. /

ifndef _SMMINTRIN_H_INCLUDED

define _SMMINTRIN_H_INCLUDED

/ We need definitions from the SSSE3, SSE3, SSE2 and SSE header files. /

include

ifndef __SSE4_1__

pragma GCC push_options

pragma GCC target("sse4.1")

define DISABLE_SSE4_1

endif / __SSE4_1__ /

/ Rounding mode macros. /

define _MM_FROUND_TO_NEAREST_INT 0x00

define _MM_FROUND_TO_NEG_INF 0x01

define _MM_FROUND_TO_POS_INF 0x02

define _MM_FROUND_TO_ZERO 0x03

define _MM_FROUND_CUR_DIRECTION 0x04

define _MM_FROUND_RAISE_EXC 0x00

define _MM_FROUND_NO_EXC 0x08

define _MM_FROUND_NINT \

(_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)

define _MM_FROUND_FLOOR \

(_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)

define _MM_FROUND_CEIL \

(_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)

define _MM_FROUND_TRUNC \

(_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)

define _MM_FROUND_RINT \

(_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)

define _MM_FROUND_NEARBYINT \

(_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)

/ Test Instruction / / Packed integer 128-bit bitwise comparison. Return 1 if (V & M) == 0. / extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_testz_si128 (m128i M, m128i V) { return builtin_ia32_ptestz128 ((v2di)M, (v2di)__V); }

/ Packed integer 128-bit bitwise comparison. Return 1 if (V & ~M) == 0. / extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_testc_si128 (m128i M, m128i V) { return builtin_ia32_ptestc128 ((v2di)M, (v2di)__V); }

/ Packed integer 128-bit bitwise comparison. Return 1 if (V & M) != 0 && (V & ~M) != 0. / extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_testnzc_si128 (m128i M, m128i V) { return builtin_ia32_ptestnzc128 ((v2di)M, (v2di)__V); }

/ Macros for packed integer 128-bit comparison intrinsics. /

define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V))

define _mm_test_all_ones(V) \

_mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))

define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))

/ Packed/scalar double precision floating point rounding. /

ifdef OPTIMIZE

extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_round_pd (m128d V, const int M) { return (m128d) builtin_ia32_roundpd ((v2df)V, M); }

extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_round_sd(m128d D, m128d V, const int M) { return (m128d) builtin_ia32_roundsd ((v2df)D, (v2df)V, M); }

else

define _mm_round_pd(V, M) \

((m128d) builtin_ia32_roundpd ((v2df)(m128d)(V), (int)(M)))

define _mm_round_sd(D, V, M) \

((m128d) builtin_ia32_roundsd ((v2df)(m128d)(D), \ (v2df)(m128d)(V), (int)(M)))

endif

/ Packed/scalar single precision floating point rounding. /

ifdef OPTIMIZE

extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_round_ps (m128 V, const int M) { return (m128) builtin_ia32_roundps ((v4sf)V, M); }

extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_round_ss (m128 D, m128 V, const int M) { return (m128) builtin_ia32_roundss ((v4sf)D, (v4sf)V, M); }

else

define _mm_round_ps(V, M) \

((m128) builtin_ia32_roundps ((v4sf)(m128)(V), (int)(M)))

define _mm_round_ss(D, V, M) \

((m128) builtin_ia32_roundss ((v4sf)(m128)(D), \ (v4sf)(m128)(V), (int)(M)))

endif

/ Macros for ceil/floor intrinsics. /

define _mm_ceil_pd(V) _mm_round_pd ((V), _MM_FROUND_CEIL)

define _mm_ceil_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_CEIL)

define _mm_floor_pd(V) _mm_round_pd((V), _MM_FROUND_FLOOR)

define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)

define _mm_ceil_ps(V) _mm_round_ps ((V), _MM_FROUND_CEIL)

define _mm_ceil_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_CEIL)

define _mm_floor_ps(V) _mm_round_ps ((V), _MM_FROUND_FLOOR)

define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)

/ SSE4.1 /

/ Integer blend instructions - select data from 2 sources using constant/variable mask. /

ifdef OPTIMIZE

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_blend_epi16 (m128i X, m128i Y, const int M) { return (m128i) builtin_ia32_pblendw128 ((v8hi)X, (v8hi)Y, M); }

else

define _mm_blend_epi16(X, Y, M) \

((m128i) builtin_ia32_pblendw128 ((v8hi)(m128i)(X), \ (v8hi)(m128i)(Y), (int)(M)))

endif

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_blendv_epi8 (m128i X, m128i Y, m128i M) { return (m128i) builtin_ia32_pblendvb128 ((v16qi)X, (v16qi)Y, (v16qi)M); }

/ Single precision floating point blend instructions - select data from 2 sources using constant/variable mask. /

ifdef OPTIMIZE

extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_blend_ps (m128 X, m128 Y, const int M) { return (m128) builtin_ia32_blendps ((v4sf)X, (v4sf)Y, M); }

else

define _mm_blend_ps(X, Y, M) \

((m128) builtin_ia32_blendps ((v4sf)(m128)(X), \ (v4sf)(m128)(Y), (int)(M)))

endif

extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_blendv_ps (m128 X, m128 Y, m128 M) { return (m128) builtin_ia32_blendvps ((v4sf)X, (v4sf)Y, (v4sf)M); }

/ Double precision floating point blend instructions - select data from 2 sources using constant/variable mask. /

ifdef OPTIMIZE

extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_blend_pd (m128d X, m128d Y, const int M) { return (m128d) builtin_ia32_blendpd ((v2df)X, (v2df)Y, M); }

else

define _mm_blend_pd(X, Y, M) \

((m128d) builtin_ia32_blendpd ((v2df)(m128d)(X), \ (v2df)(m128d)(Y), (int)(M)))

endif

extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_blendv_pd (m128d X, m128d Y, m128d M) { return (m128d) builtin_ia32_blendvpd ((v2df)X, (v2df)Y, (v2df)M); }

/ Dot product instructions with mask-defined summing and zeroing parts of result. /

ifdef OPTIMIZE

extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_dp_ps (m128 X, m128 Y, const int M) { return (m128) builtin_ia32_dpps ((v4sf)X, (v4sf)Y, M); }

extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_dp_pd (m128d X, m128d Y, const int M) { return (m128d) builtin_ia32_dppd ((v2df)X, (v2df)Y, M); }

else

define _mm_dp_ps(X, Y, M) \

((m128) builtin_ia32_dpps ((v4sf)(m128)(X), \ (v4sf)(m128)(Y), (int)(M)))

define _mm_dp_pd(X, Y, M) \

((m128d) builtin_ia32_dppd ((v2df)(m128d)(X), \ (v2df)(m128d)(Y), (int)(M)))

endif

/ Packed integer 64-bit comparison, zeroing or filling with ones corresponding parts of result. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpeq_epi64 (m128i X, m128i Y) { return (m128i) ((v2di)X == (v2di)__Y); }

/ Min/max packed integer instructions. /

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epi8 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminsb128 ((v16qi)X, (v16qi)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epi8 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxsb128 ((v16qi)X, (v16qi)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epu16 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminuw128 ((v8hi)X, (v8hi)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epu16 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxuw128 ((v8hi)X, (v8hi)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminsd128 ((v4si)X, (v4si)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxsd128 ((v4si)X, (v4si)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epu32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminud128 ((v4si)X, (v4si)Y); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epu32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxud128 ((v4si)X, (v4si)Y); }

/ Packed integer 32-bit multiplication with truncation of upper halves of results. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_mullo_epi32 (m128i X, m128i Y) { return (m128i) ((v4su)X * (v4su)__Y); }

/ Packed integer 32-bit multiplication of 2 pairs of operands with two 64-bit results. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_mul_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmuldq128 ((v4si)X, (v4si)Y); }

/ Insert single precision float into packed single precision array element selected by index N. The bits [7-6] of N define S index, the bits [5-4] define D index, and bits [3-0] define zeroing mask for D. /

ifdef OPTIMIZE

extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_insert_ps (m128 D, m128 S, const int N) { return (m128) builtin_ia32_insertps128 ((v4sf)D, (v4sf)S, N); }

else

define _mm_insert_ps(D, S, N) \

((m128) builtin_ia32_insertps128 ((v4sf)(m128)(D), \ (v4sf)(m128)(S), (int)(N)))

endif

/ Helper macro to create the N value for _mm_insert_ps. /

define _MM_MK_INSERTPS_NDX(S, D, M) (((S) << 6) | ((D) << 4) | (M))

/ Extract binary representation of single precision float from packed single precision array element of X selected by index N. /

ifdef OPTIMIZE

extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_extract_ps (m128 X, const int N) { union { int i; float f; } tmp; tmp.f = builtin_ia32_vec_ext_v4sf ((v4sf)X, N); return tmp.i; }

else

define _mm_extract_ps(X, N) \

(extension \ ({ \ union { int i; float f; } tmp; \ tmp.f = builtin_ia32_vec_ext_v4sf ((v4sf)(m128)(X), (int)(N)); \ tmp.i; \ }))

endif

/ Extract binary representation of single precision float into D from packed single precision array element of S selected by index N. /

define _MM_EXTRACT_FLOAT(D, S, N) \

{ (D) = builtin_ia32_vec_ext_v4sf ((v4sf)(S), (N)); }

/ Extract specified single precision float element into the lower part of __m128. /

define _MM_PICK_OUT_PS(X, N) \

_mm_insert_ps (_mm_setzero_ps (), (X), \ _MM_MK_INSERTPS_NDX ((N), 0, 0x0e))

/ Insert integer, S, into packed integer array element of D selected by index N. /

ifdef OPTIMIZE

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_insert_epi8 (m128i D, int S, const int N) { return (m128i) builtin_ia32_vec_set_v16qi ((v16qi)D, S, N); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_insert_epi32 (m128i D, int S, const int N) { return (m128i) builtin_ia32_vec_set_v4si ((v4si)D, S, N); }

ifdef __x86_64__

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_insert_epi64 (m128i D, long long S, const int N) { return (m128i) builtin_ia32_vec_set_v2di ((v2di)D, S, N); }

endif

else

define _mm_insert_epi8(D, S, N) \

((m128i) builtin_ia32_vec_set_v16qi ((v16qi)(m128i)(D), \ (int)(S), (int)(N)))

define _mm_insert_epi32(D, S, N) \

((m128i) builtin_ia32_vec_set_v4si ((v4si)(m128i)(D), \ (int)(S), (int)(N)))

ifdef __x86_64__

define _mm_insert_epi64(D, S, N) \

((m128i) builtin_ia32_vec_set_v2di ((v2di)(m128i)(D), \ (long long)(S), (int)(N)))

endif

endif

/ Extract integer from packed integer array element of X selected by index N. /

ifdef OPTIMIZE

extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_extract_epi8 (m128i X, const int N) { return (unsigned char) builtin_ia32_vec_ext_v16qi ((v16qi)X, __N); }

extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_extract_epi32 (m128i X, const int N) { return builtin_ia32_vec_ext_v4si ((v4si)X, __N); }

ifdef __x86_64__

extern inline long long attribute((gnu_inline, always_inline, artificial__)) _mm_extract_epi64 (m128i X, const int N) { return builtin_ia32_vec_ext_v2di ((v2di)X, __N); }

endif

else

define _mm_extract_epi8(X, N) \

((int) (unsigned char) builtin_ia32_vec_ext_v16qi ((v16qi)(__m128i)(X), (int)(N)))

define _mm_extract_epi32(X, N) \

((int) builtin_ia32_vec_ext_v4si ((v4si)(__m128i)(X), (int)(N)))

ifdef __x86_64__

define _mm_extract_epi64(X, N) \

((long long) builtin_ia32_vec_ext_v2di ((v2di)(__m128i)(X), (int)(N)))

endif

endif

/ Return horizontal packed word minimum and its index in bits [15:0] and bits [18:16] respectively. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_minpos_epu16 (m128i X) { return (m128i) builtin_ia32_phminposuw128 ((v8hi)X); }

/ Packed integer sign-extension. /

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi8_epi32 (m128i X) { return (m128i) builtin_ia32_pmovsxbd128 ((v16qi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi16_epi32 (m128i X) { return (m128i) builtin_ia32_pmovsxwd128 ((v8hi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi8_epi64 (m128i X) { return (m128i) builtin_ia32_pmovsxbq128 ((v16qi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi32_epi64 (m128i X) { return (m128i) builtin_ia32_pmovsxdq128 ((v4si)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi16_epi64 (m128i X) { return (m128i) builtin_ia32_pmovsxwq128 ((v8hi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi8_epi16 (m128i X) { return (m128i) builtin_ia32_pmovsxbw128 ((v16qi)X); }

/ Packed integer zero-extension. /

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu8_epi32 (m128i X) { return (m128i) builtin_ia32_pmovzxbd128 ((v16qi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu16_epi32 (m128i X) { return (m128i) builtin_ia32_pmovzxwd128 ((v8hi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu8_epi64 (m128i X) { return (m128i) builtin_ia32_pmovzxbq128 ((v16qi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu32_epi64 (m128i X) { return (m128i) builtin_ia32_pmovzxdq128 ((v4si)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu16_epi64 (m128i X) { return (m128i) builtin_ia32_pmovzxwq128 ((v8hi)X); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu8_epi16 (m128i X) { return (m128i) builtin_ia32_pmovzxbw128 ((v16qi)X); }

/ Pack 8 double words from 2 operands into 8 words of result with unsigned saturation. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_packus_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_packusdw128 ((v4si)X, (v4si)Y); }

/ Sum absolute 8-bit integer difference of adjacent groups of 4 byte integers in the first 2 operands. Starting offsets within operands are determined by the 3rd mask operand. /

ifdef OPTIMIZE

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_mpsadbw_epu8 (m128i X, m128i Y, const int M) { return (m128i) builtin_ia32_mpsadbw128 ((v16qi)X, (v16qi)Y, M); }

else

define _mm_mpsadbw_epu8(X, Y, M) \

((m128i) builtin_ia32_mpsadbw128 ((v16qi)(m128i)(X), \ (v16qi)(m128i)(Y), (int)(M)))

endif

/ Load double quadword using non-temporal aligned hint. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_stream_load_si128 (m128i *X) { return (m128i) builtin_ia32_movntdqa ((v2di *) X); }

ifndef __SSE4_2__

pragma GCC push_options

pragma GCC target("sse4.2")

define DISABLE_SSE4_2

endif / __SSE4_2__ /

/ These macros specify the source data format. /

define _SIDD_UBYTE_OPS 0x00

define _SIDD_UWORD_OPS 0x01

define _SIDD_SBYTE_OPS 0x02

define _SIDD_SWORD_OPS 0x03

/ These macros specify the comparison operation. /

define _SIDD_CMP_EQUAL_ANY 0x00

define _SIDD_CMP_RANGES 0x04

define _SIDD_CMP_EQUAL_EACH 0x08

define _SIDD_CMP_EQUAL_ORDERED 0x0c

/ These macros specify the polarity. /

define _SIDD_POSITIVE_POLARITY 0x00

define _SIDD_NEGATIVE_POLARITY 0x10

define _SIDD_MASKED_POSITIVE_POLARITY 0x20

define _SIDD_MASKED_NEGATIVE_POLARITY 0x30

/ These macros specify the output selection in _mm_cmpXstri (). /

define _SIDD_LEAST_SIGNIFICANT 0x00

define _SIDD_MOST_SIGNIFICANT 0x40

/ These macros specify the output selection in _mm_cmpXstrm (). /

define _SIDD_BIT_MASK 0x00

define _SIDD_UNIT_MASK 0x40

/ Intrinsics for text/string processing. /

ifdef OPTIMIZE

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrm (m128i X, m128i Y, const int M) { return (m128i) builtin_ia32_pcmpistrm128 ((v16qi)X, (v16qi)Y, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistri (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistri128 ((v16qi)X, (v16qi)Y, M); }

extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrm (m128i X, int LX, m128i Y, int LY, const int M) { return (m128i) builtin_ia32_pcmpestrm128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestri (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestri128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

else

define _mm_cmpistrm(X, Y, M) \

((m128i) builtin_ia32_pcmpistrm128 ((v16qi)(m128i)(X), \ (v16qi)(m128i)(Y), (int)(M)))

define _mm_cmpistri(X, Y, M) \

((int) builtin_ia32_pcmpistri128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))

define _mm_cmpestrm(X, LX, Y, LY, M) \

((m128i) builtin_ia32_pcmpestrm128 ((v16qi)(m128i)(X), \ (int)(LX), (v16qi)(m128i)(Y), \ (int)(LY), (int)(M)))

define _mm_cmpestri(X, LX, Y, LY, M) \

((int) builtin_ia32_pcmpestri128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))

endif

/ Intrinsics for text/string processing and reading values of EFlags. /

ifdef OPTIMIZE

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistra (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistria128 ((v16qi)X, (v16qi)Y, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrc (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistric128 ((v16qi)X, (v16qi)Y, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistro (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistrio128 ((v16qi)X, (v16qi)Y, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrs (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistris128 ((v16qi)X, (v16qi)Y, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrz (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistriz128 ((v16qi)X, (v16qi)Y, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestra (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestria128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrc (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestric128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestro (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestrio128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrs (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestris128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrz (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestriz128 ((v16qi)X, LX, (v16qi)Y, LY, M); }

else

define _mm_cmpistra(X, Y, M) \

((int) builtin_ia32_pcmpistria128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))

define _mm_cmpistrc(X, Y, M) \

((int) builtin_ia32_pcmpistric128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))

define _mm_cmpistro(X, Y, M) \

((int) builtin_ia32_pcmpistrio128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))

define _mm_cmpistrs(X, Y, M) \

((int) builtin_ia32_pcmpistris128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))

define _mm_cmpistrz(X, Y, M) \

((int) builtin_ia32_pcmpistriz128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))

define _mm_cmpestra(X, LX, Y, LY, M) \

((int) builtin_ia32_pcmpestria128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))

define _mm_cmpestrc(X, LX, Y, LY, M) \

((int) builtin_ia32_pcmpestric128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))

define _mm_cmpestro(X, LX, Y, LY, M) \

((int) builtin_ia32_pcmpestrio128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))

define _mm_cmpestrs(X, LX, Y, LY, M) \

((int) builtin_ia32_pcmpestris128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))

define _mm_cmpestrz(X, LX, Y, LY, M) \

((int) builtin_ia32_pcmpestriz128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))

endif

/ Packed integer 64-bit comparison, zeroing or filling with ones corresponding parts of result. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpgt_epi64 (m128i X, m128i Y) { return (m128i) ((v2di)X > (v2di)__Y); }

ifdef DISABLE_SSE4_2

undef DISABLE_SSE4_2

pragma GCC pop_options

endif / DISABLE_SSE4_2 /

ifdef DISABLE_SSE4_1

undef DISABLE_SSE4_1

pragma GCC pop_options

endif / DISABLE_SSE4_1 /

include

ifndef __SSE4_1__

pragma GCC push_options

pragma GCC target("sse4.1")

define DISABLE_SSE4_1

endif / __SSE4_1__ /

ifndef __SSE4_2__

pragma GCC push_options

pragma GCC target("sse4.2")

define DISABLE_SSE4_2

endif / __SSE4_1__ /

/ Accumulate CRC32 (polynomial 0x11EDC6F41) value. / extern inline unsigned int attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u8 (unsigned int C, unsigned char V) { return builtin_ia32_crc32qi (C, __V); }

extern inline unsigned int attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u16 (unsigned int C, unsigned short V) { return builtin_ia32_crc32hi (C, __V); }

extern inline unsigned int attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u32 (unsigned int C, unsigned int V) { return builtin_ia32_crc32si (C, __V); }

ifdef __x86_64__

extern inline unsigned long long attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u64 (unsigned long long C, unsigned long long V) { return builtin_ia32_crc32di (C, __V); }

endif

ifdef DISABLE_SSE4_2

undef DISABLE_SSE4_2

pragma GCC pop_options

endif / DISABLE_SSE4_2 /

ifdef DISABLE_SSE4_1

undef DISABLE_SSE4_1

pragma GCC pop_options

endif / DISABLE_SSE4_1 /

endif / _SMMINTRIN_H_INCLUDED /


I did som changes in ODM-0.7.0/SuperBuild/src/mvstexturing/elibs/mapmap/mapmap/source vector_math.impl.h canged __mm_extractepi64(aa, 0) to _mm_extract_epi32(aa, 0) row 1048 and 2870 from: union { int64_t i; __m64 v; } a1, m1, a2, m2;

if(!((unsigned long) ptr & v_get_mask<float, 4>()))
{
    /* split and store both parts */
    const __m128i aa = v_reinterpret_iv<float, 4>(a);
    a1.i = _mm_extract_epi64(aa, 0);  <-------------------------get error from this
    a2.i = _mm_extract_epi64(aa, 1);  <-------------------------get error from this
    m1.i = _mm_extract_epi64(mask, 0);  <-------------------------get error from this
    m2.i = _mm_extract_epi64(mask, 1);  <-------------------------get error from this
    _mm_maskmove_si64(a1.v, m1.v, (char *) ptr);
    _mm_maskmove_si64(a2.v, m2.v, (char *) ptr + 8);

to union { int32_t i; __m64 v; } a1, m1, a2, m2, a3, m3, a4, m4;

if(!((unsigned long) ptr & v_get_mask<float, 4>()))
{
    /* split and store both parts */
    const __m128i aa = v_reinterpret_iv<float, 4>(a);
    a1.i = _mm_extract_epi32(aa, 0);
    a2.i = _mm_extract_epi32(aa, 1);
    a3.i = _mm_extract_epi32(aa, 2);
    a4.i = _mm_extract_epi32(aa, 3);
    m1.i = _mm_extract_epi32(mask, 0);
    m2.i = _mm_extract_epi32(mask, 1);
    m3.i = _mm_extract_epi32(mask, 2);
    m4.i = _mm_extract_epi32(mask, 3);
    _mm_maskmove_si64(a1.v, m1.v, (char *) ptr);
    _mm_maskmove_si64(a2.v, m2.v, (char *) ptr + 4);
    _mm_maskmove_si64(a3.v, m3.v, (char *) ptr + 8);
    _mm_maskmove_si64(a4.v, m4.v, (char *) ptr + 12);

and som changes in row 3525 and 3553 changed from _mm256_extract_epi64(tmp, 0) to _mm256_extract_epi32(tmp, 0) from switch(imm) { case 0: b = _mm256_extract_epi64(tmp, 0); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); case 1: b = _mm256_extract_epi64(tmp, 1); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); case 2: b = _mm256_extract_epi64(tmp, 2); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); case 3: b = _mm256_extract_epi64(tmp, 3); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); default: return ((_iv_st<double, 4>) 0); } to switch(imm) { case 0: b = _mm256_extract_epi32(tmp, 0); return iv_reinterpret_v<double, 1>(b); case 1: b = _mm256_extract_epi32(tmp, 1); return iv_reinterpret_v<double, 1>(b); case 2: b = _mm256_extract_epi32(tmp, 2); return iv_reinterpret_v<double, 1>(b); case 3: b = _mm256_extract_epi32(tmp, 3); return iv_reinterpret_v<double, 1>(b); default: return ((_iv_st<double, 4>) 0); }

I don't know if it is correct but it get thrue compiling on my system Hopes it's the system data you wanted Regards Kent

dthuerck commented 4 years ago

Thanks! As expected, that's where the problem is:

#ifdef x86_64
extern __inline long long attribute((gnu_inline, always_inline, artificial))
_mm_extract_epi64 (__m128i __X, const int __N)
{
return __builtin_ia32_vec_ext_v2di ((__v2di)__X, __N);
}
#endif
#else
#define _mm_extract_epi8(X, N)
((int) (unsigned char) __builtin_ia32_vec_ext_v16qi ((__v16qi)(__m128i)(X), (int)(N)))
#define _mm_extract_epi32(X, N)
((int) __builtin_ia32_vec_ext_v4si ((__v4si)(__m128i)(X), (int)(N)))

Turns out, your 32-bit OS prevents us from using 64-bit intrinsics. A possible fix would be to manually change the compilation target: In mapmap's CMakeLists.txt, change -march=native to -march=core2 to exclude SSE4.1.

alke commented 4 years ago

Yes it works to Thanks