Closed alke closed 4 years ago
Hi,
_mm_extract_epi64 is a SSE 4.1 function; your CPU supports AVX, so you should be good there.
However, I'm a little worried about your OS being 32 bit. Could you please paste the output of
cat /proc/cpuinfo
and append your computer's smmintrin.h (usually to be found in /usr/lib/gcc/x86_64-linux-gnu/???/include depending on your compiler version).
Thanks!
Hello here it comes
cat /proc/cpuinfo:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 885.156 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 1011.828 cache size : 6144 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 2 initial apicid : 2 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 825.945 cache size : 6144 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 867.796 cache size : 6144 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 896.757 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 1 initial apicid : 1 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 895.210 cache size : 6144 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 3 initial apicid : 3 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 991.117 cache size : 6144 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 5 initial apicid : 5 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz stepping : 7 microcode : 0x2f cpu MHz : 907.070 cache size : 6144 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 4389.92 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
And the file /usr/lib/gcc/i686-linux-gnu/5.4.0/include/smmintrin.h /* Copyright (C) 2007-2015 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Under Section 7 of GPL version 3, you are granted additional permissions described in the GCC Runtime Library Exception, version 3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and a copy of the GCC Runtime Library Exception along with this program; see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */
/ Implemented from the specification included in the Intel C++ Compiler User Guide and Reference, version 10.0. /
/ We need definitions from the SSSE3, SSE3, SSE2 and SSE header files. /
/ Rounding mode macros. /
(_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
(_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
(_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
(_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
(_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
(_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
/ Test Instruction / / Packed integer 128-bit bitwise comparison. Return 1 if (V & M) == 0. / extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_testz_si128 (m128i M, m128i V) { return builtin_ia32_ptestz128 ((v2di)M, (v2di)__V); }
/ Packed integer 128-bit bitwise comparison. Return 1 if (V & ~M) == 0. / extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_testc_si128 (m128i M, m128i V) { return builtin_ia32_ptestc128 ((v2di)M, (v2di)__V); }
/ Packed integer 128-bit bitwise comparison. Return 1 if (V & M) != 0 && (V & ~M) != 0. / extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_testnzc_si128 (m128i M, m128i V) { return builtin_ia32_ptestnzc128 ((v2di)M, (v2di)__V); }
/ Macros for packed integer 128-bit comparison intrinsics. /
_mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))
/ Packed/scalar double precision floating point rounding. /
extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_round_pd (m128d V, const int M) { return (m128d) builtin_ia32_roundpd ((v2df)V, M); }
extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_round_sd(m128d D, m128d V, const int M) { return (m128d) builtin_ia32_roundsd ((v2df)D, (v2df)V, M); }
((m128d) builtin_ia32_roundpd ((v2df)(m128d)(V), (int)(M)))
((m128d) builtin_ia32_roundsd ((v2df)(m128d)(D), \ (v2df)(m128d)(V), (int)(M)))
/ Packed/scalar single precision floating point rounding. /
extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_round_ps (m128 V, const int M) { return (m128) builtin_ia32_roundps ((v4sf)V, M); }
extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_round_ss (m128 D, m128 V, const int M) { return (m128) builtin_ia32_roundss ((v4sf)D, (v4sf)V, M); }
((m128) builtin_ia32_roundps ((v4sf)(m128)(V), (int)(M)))
((m128) builtin_ia32_roundss ((v4sf)(m128)(D), \ (v4sf)(m128)(V), (int)(M)))
/ Macros for ceil/floor intrinsics. /
/ SSE4.1 /
/ Integer blend instructions - select data from 2 sources using constant/variable mask. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_blend_epi16 (m128i X, m128i Y, const int M) { return (m128i) builtin_ia32_pblendw128 ((v8hi)X, (v8hi)Y, M); }
((m128i) builtin_ia32_pblendw128 ((v8hi)(m128i)(X), \ (v8hi)(m128i)(Y), (int)(M)))
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_blendv_epi8 (m128i X, m128i Y, m128i M) { return (m128i) builtin_ia32_pblendvb128 ((v16qi)X, (v16qi)Y, (v16qi)M); }
/ Single precision floating point blend instructions - select data from 2 sources using constant/variable mask. /
extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_blend_ps (m128 X, m128 Y, const int M) { return (m128) builtin_ia32_blendps ((v4sf)X, (v4sf)Y, M); }
((m128) builtin_ia32_blendps ((v4sf)(m128)(X), \ (v4sf)(m128)(Y), (int)(M)))
extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_blendv_ps (m128 X, m128 Y, m128 M) { return (m128) builtin_ia32_blendvps ((v4sf)X, (v4sf)Y, (v4sf)M); }
/ Double precision floating point blend instructions - select data from 2 sources using constant/variable mask. /
extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_blend_pd (m128d X, m128d Y, const int M) { return (m128d) builtin_ia32_blendpd ((v2df)X, (v2df)Y, M); }
((m128d) builtin_ia32_blendpd ((v2df)(m128d)(X), \ (v2df)(m128d)(Y), (int)(M)))
extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_blendv_pd (m128d X, m128d Y, m128d M) { return (m128d) builtin_ia32_blendvpd ((v2df)X, (v2df)Y, (v2df)M); }
/ Dot product instructions with mask-defined summing and zeroing parts of result. /
extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_dp_ps (m128 X, m128 Y, const int M) { return (m128) builtin_ia32_dpps ((v4sf)X, (v4sf)Y, M); }
extern inline m128d attribute((gnu_inline, always_inline, artificial)) _mm_dp_pd (m128d X, m128d Y, const int M) { return (m128d) builtin_ia32_dppd ((v2df)X, (v2df)Y, M); }
((m128) builtin_ia32_dpps ((v4sf)(m128)(X), \ (v4sf)(m128)(Y), (int)(M)))
((m128d) builtin_ia32_dppd ((v2df)(m128d)(X), \ (v2df)(m128d)(Y), (int)(M)))
/ Packed integer 64-bit comparison, zeroing or filling with ones corresponding parts of result. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpeq_epi64 (m128i X, m128i Y) { return (m128i) ((v2di)X == (v2di)__Y); }
/ Min/max packed integer instructions. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epi8 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminsb128 ((v16qi)X, (v16qi)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epi8 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxsb128 ((v16qi)X, (v16qi)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epu16 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminuw128 ((v8hi)X, (v8hi)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epu16 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxuw128 ((v8hi)X, (v8hi)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminsd128 ((v4si)X, (v4si)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxsd128 ((v4si)X, (v4si)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_min_epu32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pminud128 ((v4si)X, (v4si)Y); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_max_epu32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmaxud128 ((v4si)X, (v4si)Y); }
/ Packed integer 32-bit multiplication with truncation of upper halves of results. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_mullo_epi32 (m128i X, m128i Y) { return (m128i) ((v4su)X * (v4su)__Y); }
/ Packed integer 32-bit multiplication of 2 pairs of operands with two 64-bit results. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_mul_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_pmuldq128 ((v4si)X, (v4si)Y); }
/ Insert single precision float into packed single precision array element selected by index N. The bits [7-6] of N define S index, the bits [5-4] define D index, and bits [3-0] define zeroing mask for D. /
extern inline m128 attribute((gnu_inline, always_inline, artificial)) _mm_insert_ps (m128 D, m128 S, const int N) { return (m128) builtin_ia32_insertps128 ((v4sf)D, (v4sf)S, N); }
((m128) builtin_ia32_insertps128 ((v4sf)(m128)(D), \ (v4sf)(m128)(S), (int)(N)))
/ Helper macro to create the N value for _mm_insert_ps. /
/ Extract binary representation of single precision float from packed single precision array element of X selected by index N. /
extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_extract_ps (m128 X, const int N) { union { int i; float f; } tmp; tmp.f = builtin_ia32_vec_ext_v4sf ((v4sf)X, N); return tmp.i; }
(extension \ ({ \ union { int i; float f; } tmp; \ tmp.f = builtin_ia32_vec_ext_v4sf ((v4sf)(m128)(X), (int)(N)); \ tmp.i; \ }))
/ Extract binary representation of single precision float into D from packed single precision array element of S selected by index N. /
{ (D) = builtin_ia32_vec_ext_v4sf ((v4sf)(S), (N)); }
/ Extract specified single precision float element into the lower part of __m128. /
_mm_insert_ps (_mm_setzero_ps (), (X), \ _MM_MK_INSERTPS_NDX ((N), 0, 0x0e))
/ Insert integer, S, into packed integer array element of D selected by index N. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_insert_epi8 (m128i D, int S, const int N) { return (m128i) builtin_ia32_vec_set_v16qi ((v16qi)D, S, N); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_insert_epi32 (m128i D, int S, const int N) { return (m128i) builtin_ia32_vec_set_v4si ((v4si)D, S, N); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_insert_epi64 (m128i D, long long S, const int N) { return (m128i) builtin_ia32_vec_set_v2di ((v2di)D, S, N); }
((m128i) builtin_ia32_vec_set_v16qi ((v16qi)(m128i)(D), \ (int)(S), (int)(N)))
((m128i) builtin_ia32_vec_set_v4si ((v4si)(m128i)(D), \ (int)(S), (int)(N)))
((m128i) builtin_ia32_vec_set_v2di ((v2di)(m128i)(D), \ (long long)(S), (int)(N)))
/ Extract integer from packed integer array element of X selected by index N. /
extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_extract_epi8 (m128i X, const int N) { return (unsigned char) builtin_ia32_vec_ext_v16qi ((v16qi)X, __N); }
extern inline int attribute((gnu_inline, always_inline, artificial__)) _mm_extract_epi32 (m128i X, const int N) { return builtin_ia32_vec_ext_v4si ((v4si)X, __N); }
extern inline long long attribute((gnu_inline, always_inline, artificial__)) _mm_extract_epi64 (m128i X, const int N) { return builtin_ia32_vec_ext_v2di ((v2di)X, __N); }
((int) (unsigned char) builtin_ia32_vec_ext_v16qi ((v16qi)(__m128i)(X), (int)(N)))
((int) builtin_ia32_vec_ext_v4si ((v4si)(__m128i)(X), (int)(N)))
((long long) builtin_ia32_vec_ext_v2di ((v2di)(__m128i)(X), (int)(N)))
/ Return horizontal packed word minimum and its index in bits [15:0] and bits [18:16] respectively. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_minpos_epu16 (m128i X) { return (m128i) builtin_ia32_phminposuw128 ((v8hi)X); }
/ Packed integer sign-extension. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi8_epi32 (m128i X) { return (m128i) builtin_ia32_pmovsxbd128 ((v16qi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi16_epi32 (m128i X) { return (m128i) builtin_ia32_pmovsxwd128 ((v8hi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi8_epi64 (m128i X) { return (m128i) builtin_ia32_pmovsxbq128 ((v16qi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi32_epi64 (m128i X) { return (m128i) builtin_ia32_pmovsxdq128 ((v4si)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi16_epi64 (m128i X) { return (m128i) builtin_ia32_pmovsxwq128 ((v8hi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepi8_epi16 (m128i X) { return (m128i) builtin_ia32_pmovsxbw128 ((v16qi)X); }
/ Packed integer zero-extension. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu8_epi32 (m128i X) { return (m128i) builtin_ia32_pmovzxbd128 ((v16qi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu16_epi32 (m128i X) { return (m128i) builtin_ia32_pmovzxwd128 ((v8hi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu8_epi64 (m128i X) { return (m128i) builtin_ia32_pmovzxbq128 ((v16qi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu32_epi64 (m128i X) { return (m128i) builtin_ia32_pmovzxdq128 ((v4si)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu16_epi64 (m128i X) { return (m128i) builtin_ia32_pmovzxwq128 ((v8hi)X); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cvtepu8_epi16 (m128i X) { return (m128i) builtin_ia32_pmovzxbw128 ((v16qi)X); }
/ Pack 8 double words from 2 operands into 8 words of result with unsigned saturation. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_packus_epi32 (m128i X, m128i Y) { return (m128i) builtin_ia32_packusdw128 ((v4si)X, (v4si)Y); }
/ Sum absolute 8-bit integer difference of adjacent groups of 4 byte integers in the first 2 operands. Starting offsets within operands are determined by the 3rd mask operand. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_mpsadbw_epu8 (m128i X, m128i Y, const int M) { return (m128i) builtin_ia32_mpsadbw128 ((v16qi)X, (v16qi)Y, M); }
((m128i) builtin_ia32_mpsadbw128 ((v16qi)(m128i)(X), \ (v16qi)(m128i)(Y), (int)(M)))
/ Load double quadword using non-temporal aligned hint. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_stream_load_si128 (m128i *X) { return (m128i) builtin_ia32_movntdqa ((v2di *) X); }
/ These macros specify the source data format. /
/ These macros specify the comparison operation. /
/ These macros specify the polarity. /
/ These macros specify the output selection in _mm_cmpXstri (). /
/ These macros specify the output selection in _mm_cmpXstrm (). /
/ Intrinsics for text/string processing. /
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrm (m128i X, m128i Y, const int M) { return (m128i) builtin_ia32_pcmpistrm128 ((v16qi)X, (v16qi)Y, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistri (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistri128 ((v16qi)X, (v16qi)Y, M); }
extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrm (m128i X, int LX, m128i Y, int LY, const int M) { return (m128i) builtin_ia32_pcmpestrm128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestri (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestri128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
((m128i) builtin_ia32_pcmpistrm128 ((v16qi)(m128i)(X), \ (v16qi)(m128i)(Y), (int)(M)))
((int) builtin_ia32_pcmpistri128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))
((m128i) builtin_ia32_pcmpestrm128 ((v16qi)(m128i)(X), \ (int)(LX), (v16qi)(m128i)(Y), \ (int)(LY), (int)(M)))
((int) builtin_ia32_pcmpestri128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))
/ Intrinsics for text/string processing and reading values of EFlags. /
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistra (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistria128 ((v16qi)X, (v16qi)Y, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrc (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistric128 ((v16qi)X, (v16qi)Y, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistro (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistrio128 ((v16qi)X, (v16qi)Y, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrs (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistris128 ((v16qi)X, (v16qi)Y, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpistrz (m128i X, m128i Y, const int M) { return builtin_ia32_pcmpistriz128 ((v16qi)X, (v16qi)Y, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestra (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestria128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrc (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestric128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestro (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestrio128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrs (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestris128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
extern inline int attribute((gnu_inline, always_inline, artificial)) _mm_cmpestrz (m128i X, int LX, m128i Y, int LY, const int M) { return builtin_ia32_pcmpestriz128 ((v16qi)X, LX, (v16qi)Y, LY, M); }
((int) builtin_ia32_pcmpistria128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))
((int) builtin_ia32_pcmpistric128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))
((int) builtin_ia32_pcmpistrio128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))
((int) builtin_ia32_pcmpistris128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))
((int) builtin_ia32_pcmpistriz128 ((v16qi)(m128i)(X), \ (v16qi)(__m128i)(Y), (int)(M)))
((int) builtin_ia32_pcmpestria128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))
((int) builtin_ia32_pcmpestric128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))
((int) builtin_ia32_pcmpestrio128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))
((int) builtin_ia32_pcmpestris128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))
((int) builtin_ia32_pcmpestriz128 ((v16qi)(m128i)(X), (int)(LX), \ (v16qi)(__m128i)(Y), (int)(LY), \ (int)(M)))
/ Packed integer 64-bit comparison, zeroing or filling with ones corresponding parts of result. / extern inline m128i attribute((gnu_inline, always_inline, artificial)) _mm_cmpgt_epi64 (m128i X, m128i Y) { return (m128i) ((v2di)X > (v2di)__Y); }
/ Accumulate CRC32 (polynomial 0x11EDC6F41) value. / extern inline unsigned int attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u8 (unsigned int C, unsigned char V) { return builtin_ia32_crc32qi (C, __V); }
extern inline unsigned int attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u16 (unsigned int C, unsigned short V) { return builtin_ia32_crc32hi (C, __V); }
extern inline unsigned int attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u32 (unsigned int C, unsigned int V) { return builtin_ia32_crc32si (C, __V); }
extern inline unsigned long long attribute((gnu_inline, always_inline, artificial__)) _mm_crc32_u64 (unsigned long long C, unsigned long long V) { return builtin_ia32_crc32di (C, __V); }
I did som changes in ODM-0.7.0/SuperBuild/src/mvstexturing/elibs/mapmap/mapmap/source vector_math.impl.h canged __mm_extractepi64(aa, 0) to _mm_extract_epi32(aa, 0) row 1048 and 2870 from: union { int64_t i; __m64 v; } a1, m1, a2, m2;
if(!((unsigned long) ptr & v_get_mask<float, 4>()))
{
/* split and store both parts */
const __m128i aa = v_reinterpret_iv<float, 4>(a);
a1.i = _mm_extract_epi64(aa, 0); <-------------------------get error from this
a2.i = _mm_extract_epi64(aa, 1); <-------------------------get error from this
m1.i = _mm_extract_epi64(mask, 0); <-------------------------get error from this
m2.i = _mm_extract_epi64(mask, 1); <-------------------------get error from this
_mm_maskmove_si64(a1.v, m1.v, (char *) ptr);
_mm_maskmove_si64(a2.v, m2.v, (char *) ptr + 8);
to union { int32_t i; __m64 v; } a1, m1, a2, m2, a3, m3, a4, m4;
if(!((unsigned long) ptr & v_get_mask<float, 4>()))
{
/* split and store both parts */
const __m128i aa = v_reinterpret_iv<float, 4>(a);
a1.i = _mm_extract_epi32(aa, 0);
a2.i = _mm_extract_epi32(aa, 1);
a3.i = _mm_extract_epi32(aa, 2);
a4.i = _mm_extract_epi32(aa, 3);
m1.i = _mm_extract_epi32(mask, 0);
m2.i = _mm_extract_epi32(mask, 1);
m3.i = _mm_extract_epi32(mask, 2);
m4.i = _mm_extract_epi32(mask, 3);
_mm_maskmove_si64(a1.v, m1.v, (char *) ptr);
_mm_maskmove_si64(a2.v, m2.v, (char *) ptr + 4);
_mm_maskmove_si64(a3.v, m3.v, (char *) ptr + 8);
_mm_maskmove_si64(a4.v, m4.v, (char *) ptr + 12);
and som changes in row 3525 and 3553 changed from _mm256_extract_epi64(tmp, 0) to _mm256_extract_epi32(tmp, 0) from switch(imm) { case 0: b = _mm256_extract_epi64(tmp, 0); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); case 1: b = _mm256_extract_epi64(tmp, 1); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); case 2: b = _mm256_extract_epi64(tmp, 2); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); case 3: b = _mm256_extract_epi64(tmp, 3); <-------------------------get error from this return iv_reinterpret_v<double, 1>(b); default: return ((_iv_st<double, 4>) 0); } to switch(imm) { case 0: b = _mm256_extract_epi32(tmp, 0); return iv_reinterpret_v<double, 1>(b); case 1: b = _mm256_extract_epi32(tmp, 1); return iv_reinterpret_v<double, 1>(b); case 2: b = _mm256_extract_epi32(tmp, 2); return iv_reinterpret_v<double, 1>(b); case 3: b = _mm256_extract_epi32(tmp, 3); return iv_reinterpret_v<double, 1>(b); default: return ((_iv_st<double, 4>) 0); }
I don't know if it is correct but it get thrue compiling on my system Hopes it's the system data you wanted Regards Kent
Thanks! As expected, that's where the problem is:
#ifdef x86_64
extern __inline long long attribute((gnu_inline, always_inline, artificial))
_mm_extract_epi64 (__m128i __X, const int __N)
{
return __builtin_ia32_vec_ext_v2di ((__v2di)__X, __N);
}
#endif
#else
#define _mm_extract_epi8(X, N)
((int) (unsigned char) __builtin_ia32_vec_ext_v16qi ((__v16qi)(__m128i)(X), (int)(N)))
#define _mm_extract_epi32(X, N)
((int) __builtin_ia32_vec_ext_v4si ((__v4si)(__m128i)(X), (int)(N)))
Turns out, your 32-bit OS prevents us from using 64-bit intrinsics.
A possible fix would be to manually change the compilation target: In mapmap's CMakeLists.txt, change -march=native
to -march=core2
to exclude SSE4.1.
Yes it works to Thanks
Maybe more of a question I get this error when compile opendronemap 0.7.0 SuperBuild/src/mvstexturing/elibs/mapmap/mapmap/source/vector_math.impl.h:1057:33: error: ‘_mm_extract_epi64’ was not declared in this scope a1.i = _mm_extract_epi64(aa, 0);
os system is Ubuntu 16.04 LTS 32-bit memmory 7,8 GiB processor Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz Grapic Intel® Sandybridge Mobile x86/MMX/SSE2
maybe mapmap not supports this configuration?