Daniel-Liu-c0deb0t / triple_accel

Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.
MIT License
103 stars 13 forks source link

Custom comparison function #7

Closed mbhall88 closed 2 years ago

mbhall88 commented 2 years ago

Forgive me if this is a silly question, but do you think it would be possible to provide an alternate hamming distance function that can take a custom comparison function?

For instance, in bioinformatics, if I want the hamming distance between two sequences, but I want to ignore Ns.

For example

fn hamming(a: &[u8], b: &[u8], f: &dyn Fn(u8, u8) -> u64) -> u64 {
    a.iter().zip(b).fold(0, |acc, (x, y)| acc + f(*x, *y))
}

fn main() {
    let s1 = b"ACGT";
    let s2 = b"ANCT";
    fn dist(a: u8, b: u8) -> u64 { (a != b'N' && b != b'N' && a != b) as u64 }

    assert_eq!(dist(s1[2], s2[2]), 1);
    assert_eq!(dist(s1[0], s2[0]), 0);
    assert_eq!(dist(s1[1], s2[1]), 0);

    assert_eq!(hamming(s1, s2, &dist), 1);
}
Daniel-Liu-c0deb0t commented 2 years ago

The hamming distance function is vectorized, so any custom compare functions would need a custom SIMD implementation. I guess the scalar version can be made to support generic comparison functions, but then it wouldn't make too much sense given that this library is mainly focused on vectorized distance algorithms.

mbhall88 commented 2 years ago

Ahh I see. No worries, thought I'd ask just in case. Thanks.