ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
998 stars 59 forks source link

Implement Mahalanobis distance #149

Closed guyrosin closed 2 months ago

guyrosin commented 4 months ago

Hey, Is there any chance to have Mahalanobis distance in SimSIMD? https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.mahalanobis.html It's useful for measuring the distance between a point and a distribution.

Thanks!

ashvardanian commented 2 months ago

On it! Very curious to learn more bout your use case, @guyrosin 🤗

guyrosin commented 2 months ago

Yay, thank you @ashvardanian! This is a new project I've recently started. I'm trying to use this metric to measure similarity between single data points and a target distribution; and Mahalanobis distance seems to be much more fitting to this use case than other, more popular metrics. Scale will hopefully increase over time and that's why I opened this issue :) If you're interested I'll gladly share more info as this project progresses!

ashvardanian commented 2 months ago

Hey, @guyrosin! The most important initial question are:

  1. What kind of scalars are you generally using?
  2. How relevant is the accuracy? (not unrelated to 1. 😄)
  3. What kind of hardware are you running on?

For the last one, just share the output of:

pip install simsimd
python -c "import simsimd; print(simsimd.get_capabilities())"
guyrosin commented 2 months ago
  1. I'm currently using float32 (with plans to use bf16 in the future). Vector length is usually 300-500.
  2. I don't have any hard requirements for accuracy :)
  3. Hardware specs:
    lscpu
❯ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            7
    BogoMIPS:            4999.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopol
                         ogy nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hyper
                         visor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd 
                         avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    4 MiB (4 instances)
  L3:                    35.8 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:         
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Vulnerable
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
  Srbds:                 Not affected
  Tsx async abort:       Not affected

{'serial': True, 'neon': False, 'sve': False, 'neon_f16': False, 'sve_f16': False, 'neon_bf16': False, 'sve_bf16': False, 'neon_i8': False, 'sve_i8': False, 'haswell': True, 'skylake': False, 'ice': False, 'genoa': False, 'sapphire': False}

Thanks for your help!