arve0 / feathers-mongodb-fuzzy-search

Add fuzzy $search to mongodb service.find queries
https://www.npmjs.com/package/feathers-mongodb-fuzzy-search
40 stars 10 forks source link

[Feature Request] Multi-column fuzzy search. À la feathers-nedb-fuzzy-search #9

Open FossPrime opened 6 years ago

FossPrime commented 6 years ago

I really like the implementation of full-text search on NeDB from a user perspective.

MongoDB $text seems to have been designed for internal use, such as finding labels, tags and other pre-known values. It does not do anything "fuzzy."

In the NeDB implementation I can have one omni search input for names, email address and phone number all at once with live refresh. In MongoDB with $text If I search "Kardashian", it won't find "The Kardashians," or anyone with an email containing it.

I would propose scrapping support for $text and just doing a true RegEx fuzzy search by default. Right now there is no way to do a multi column truly fuzzy search, unlike in the NeDB module.

in reference to #5 and https://github.com/arve0/feathers-mongodb-fuzzy-search/issues/5#issuecomment-349676430

amaury1093 commented 6 years ago

I don't know how nedb fuzzy-search works, but would this be multi-column fuzzy search?

{
  query: {
    or: [
      { name: { $search: 'kardashian' } },
      { email: { $search: 'kardashian' } },
    ]
  }
}

and setting the hook as search({ fields: ['name', 'email'] })

FossPrime commented 6 years ago

That works... only issue is my clients have no idea what storage backend the api server is using. I suppose I could create some sort of adapter hook that translated a nedb fuzzy query to a mongo fuzzy one.

From client:

{
  query: {
    $search: 'kardashian'
  }
}

to, in the api server

{
  query: {
    or: [
      { name: { $search: 'kardashian' } },
      { email: { $search: 'kardashian' } },
    ]
  }
}
// register after manipulation: search({ fields: ['name', 'email'] })
arve0 commented 6 years ago

Hi @rayfoss!

In MongoDB with $text If I search "Kardashian", it won't find "The Kardashians," or anyone with an email containing it.

I agree that the $text search could have been more fuzzy. It is supposed to do stemming, so searching for "Kardashian" should find "The Kardashians".

Right now there is no way to do a multi column truly fuzzy search, unlike in the NeDB module.

Note that you can add several fields to the text index in mongodb, to allow for multi column search.

I would propose scrapping support for $text and just doing a true RegEx fuzzy search by default.

RegEx will not be the default, due to

  1. stemming and
  2. performance.
FossPrime commented 6 years ago

https://docs.mongodb.com/v3.4/reference/operator/query/text/#match-operation-stemmed-words.

The stemming thing is rather interesting and means you can't use it to search through emails or phone numbers, without knowing the full email/phone.... which is pointless. It's not something you would ever figure out on your own without reading documentation. Perhaps moving $text to $natural or $stemText would help.

Still the NeDB to Mongo conversion should be fairly painless, the API's of these plugins are so different that if you rely on these plugins it won't be. Coming from the very satisfying NeDB plugin, this one seems needlessly complicated and arcane.

Another idea is add another query altogether that is common to both and easy to deal with... $simple?

claustres commented 6 years ago

You might be interested in this historical post https://github.com/feathersjs/feathers/issues/334#issuecomment-260484749. The fact is that it is not so easy to abstract "fuzzy search", and a solution working let say with Mongo will probably not work with SQL.

First there is different things behind and more evolved DBs like Elastic Search use different algorithms to measure the fuzzy distance between two strings, RegEx is not the only way and certainly not the best one. Second, If with NeDB you can easily fuzzy search all fields is probably linked to the fact that it is usually not a production DB. In production you have to work with indexes if you want performances, as a consequence you need to define which fields of your data model are "searchable", the user can't simply search into everything. So yes you will probably end into coding a mapping between a general search facade and the actual DB query in general.

It seems that there have been some minor changes since my fork but at that time you could simply do something like this on your app to make all fields "fuzzy searchable" with Mongo:

import fuzzySearch from 'feathers-mongodb-fuzzy-search'

app.hooks({
  before: {
    find: [ fuzzySearch() ]
  }
})
// Then on any service
service.find({ query: { fieldName: { $search: "ti" } } })

It seems now that either a non-empty fields or excludedFields is mandatory, maybe releasing this constraint will make it more simple to use ?

FossPrime commented 6 years ago

The focus of the ticket is merely to have API parity between NeDB and MongoDB plugins.

While I preferred the NeDB way, anything that lets me reuse client code regardless of whether I'm using NeDB or Mongo DB on the back would be great. The issues you had with Elastic and SQL are irrelevant, as NeDB cannot be used in production responsibly, it doesn't even have basic networking support.

In my case I work with NeDB on development and Mongo on prod... they are extremely similar API's and the whole point of NeDB was to make interchange seamless... these plugins break the seamlessness.

I propose altering NeDB fuzzy search if need be to achieve feature parity.

Perhaps this ticket should be on the NeDB repo instead

arve0 commented 6 years ago

@rayfoss I agree, the plugins should have feature parity, that is the whole point of these two plugins, though maybe not communicated good enough. And unfortunately, I have not updated the NeDB plugin after the contributions (searching individual fields) from @claustres.

Still, going from NeDB to MongoDB is possible, as the features are the same when the plugin is invoked without options.

Perhaps this ticket should be on the NeDB repo instead

I've opened an issue.