compenguy / ngrammatic

A rust crate providing fuzzy search/string matching using N-grams
MIT License
25 stars 7 forks source link

Rayon support #11

Open LucaCappelletti94 opened 3 months ago

LucaCappelletti94 commented 3 months ago

As per title:

Will handle this in a future pull request

compenguy commented 3 months ago

Let me see if I understand where you're going with this...

pub fn search_with_warp(&self, text: &str, warp: f32, threshold: f32) -> Vec<SearchResult> {
    ...
    for gram in item.grams.keys() {
        if let Some(words) = self.gram_to_words.get(gram) {
            // Fetch ngrams from raw words
            ngrams_to_consider.extend(words.iter().filter_map(|word| self.ngrams.get(word)));
        }
    }
    ...
}

Becomes (wildly untested code/logic follows):

pub fn search_with_warp(&self, text: &str, warp: f32, threshold: f32) -> Vec<SearchResult> {
    ...
    ngrams_to_consider.par_extend(item.grams.keys()
        .into_par_iter()
        .filter_map(|gram| self.gram_to_words.get(gram))
        .flat_map(|words| words.iter().filter_map(|word| self.ngrams.get(word)));
    }
    ...
}

?

Just thinking this through... we could also parallel sort the search results.

And on the corpusbuilder...

    /// Provide an iterator that will yield strings to be added to the
    /// `Corpus`.
    pub fn par_fill<It>(mut self, iterable: It) -> Self
    where
        It: IntoIterator,
        It::Item: Into<String>,
    {
        self.texts.par_extend(iterable.into_par_iter().map(<_>::into));
        self
    }

That one seems pretty trivial.

LucaCappelletti94 commented 3 months ago

Yeah, that would be the overall gist of it - the only important thing is to keep this as an optional feature as rayon does not support several compilation targets (e.g. wasm)

compenguy commented 3 months ago

my thinking was that it would be added as a non-default optional feature (with benchmark data, I'd be willing to consider default on)

whether enabling the feature adds new par_search() and par_fill() methods, or replaces the implementations of search() and fill() with a parallel implementation is an unsettled question.