VectorCamp / vectorscan

A portable fork of the high-performance regular expression matching library
https://www.vectorcamp.gr/project/vectorscan/
Other
512 stars 55 forks source link

first cut at fixing a crc32 issue reported by a user #308

Open isildur-g opened 3 months ago

isildur-g commented 3 months ago

this might fix a problem a user reported where it uses the system library's _mm_crc32_u64 instead of the simde version. making a PR to test in CI

Unit193 commented 3 months ago

When testing with this PR, I seem to get

Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007f6fce2a9f78 in hs_alloc_scratch (db=<optimized out>, scratch=0x7f6fce3e6bc8) at ./src/scratch.c:367
367     ./src/scratch.c: No such file or directory.
(gdb) bt full
#0  0x00007f6fce2a9f78 in hs_alloc_scratch (db=<optimized out>, scratch=0x7f6fce3e6bc8) at ./src/scratch.c:367
        rv = <optimized out>
        rose = 0x560141f522c0
        resize = 1
        proto = 0x560141ecdbc0
        proto_tmp = 0x560141ecdb90
        proto_ret = 0
        som_store_count = 0
        queueCount = 7
        bStateSize = <optimized out>
        fullStateSize = 108
isildur-g commented 3 months ago

hello, i can't reproduce this problem , either on a VM with specifically extensions only up to sse2 enabled, or on pretty old real hardware (intel core 5, the oldest x86-64 machine i have running here). Are you sure your machine actually supports sse2? what cpu/model is it? could you also paste exactly the cmake options/env vars etc you used to build it?

When testing with this PR, I seem to get

Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007f6fce2a9f78 in hs_alloc_scratch (db=<optimized out>, scratch=0x7f6fce3e6bc8) at ./src/scratch.c:367
367     ./src/scratch.c: No such file or directory.
(gdb) bt full
#0  0x00007f6fce2a9f78 in hs_alloc_scratch (db=<optimized out>, scratch=0x7f6fce3e6bc8) at ./src/scratch.c:367
        rv = <optimized out>
        rose = 0x560141f522c0
        resize = 1
        proto = 0x560141ecdbc0
        proto_tmp = 0x560141ecdb90
        proto_ret = 0
        som_store_count = 0
        queueCount = 7
        bStateSize = <optimized out>
        fullStateSize = 108
markos commented 3 months ago

@Unit193 ^

Unit193 commented 3 months ago

Well it claims to, but it's pretty dang old too.

Handle 0x0400, DMI type 4, 40 bytes
Processor Information
        Socket Designation: CPU1
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: 64 0F 00 00 FF FB EB BF
        Signature: Type 0, Family 15, Model 6, Stepping 4
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)                                                                                                                                    
                PAE (Physical address extension)
                MCE (Machine check exception)                                                                                                                                     
                CX8 (CMPXCHG8 instruction supported)                                                                                                                              
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)                                                                                                                                
                PGE (Page global enable)
                MCA (Machine check architecture)                                                                                                                                  
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)                                                                                                                               
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)                                                                                                                                             
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)                                                                                                                   
                SSE (Streaming SIMD extensions)                                                                                                                                   
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)                                                                                                                                                   
                HTT (Multi-threading)                                                                                                                                             
                TM (Thermal monitor supported)                                                                                                                                    
                PBE (Pending break enabled)                                                                                                                                       
        Version:                   Intel(R) Xeon(TM) CPU 3.00GHz                                                                                                                  
        Voltage: 1.4 V                                                                                                                                                            
        External Clock: 667 MHz                                                                                                                                                   
        Max Speed: 3600 MHz                                                                                                                                                       
        Current Speed: 2333 MHz                                                                                                                                                   
        Status: Populated, Enabled                                                                                                                                                
        Upgrade: Socket LGA771                                                                                                                                                    
        L1 Cache Handle: 0x0700                                                                                                                                                   
        L2 Cache Handle: 0x0701
        L3 Cache Handle: 0x0702
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified
        Core Count: 2
        Core Enabled: 2
        Thread Count: 4
        Characteristics:
                64-bit capable

Generally speaking, something like -DBUILD_AVX2=off -DBUILD_AVX512=off -DBUILD_AVX512VBMI=off -DFAT_RUNTIME=off -DBUILD_SSE2_SIMDE=on is used.