elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.21k stars 24.85k forks source link

"Did you mean" spellchecking #911

Closed sindresorhus closed 11 years ago

sindresorhus commented 13 years ago

Google's "Did you mean" feature is very useful. Would be awesome if ES could implement this.

Lucene has pulled in the SpellChecker contrib. Maybe ES could expose that?

Ex. if I specify suggestSimilar with some optional parameters in my search object I could get back an array with some suggestions.

keteracel commented 13 years ago

you can implement this yourself by having a search term index, probably using ngram and then sorted by popularity.

sindresorhus commented 13 years ago

Can you give an example?

keteracel commented 13 years ago

something like this: http://sujitpal.blogspot.com/2007/12/spelling-checker-with-lucene.html

keteracel commented 13 years ago

But I also see that Lucene has pulled in the SpellChecker contrib: http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/search/spell/SpellChecker.html so I guess ES could expose that.

sindresorhus commented 13 years ago

@keteracel Red the article you linked. Looks interesting, but is probably more than I can handle at the moment. I really think something as useful as this should be in ES by default. I've updated the issue with a better description.

kimchy commented 13 years ago

The current spell checker requires building an auxilery index in order to support it (and moreover, requires reindexing the data periodically). In Lucene 4.0, since fuzzy queries are much faster, spell checking can be done on the main index. So, the logic is that it makes little sense to incorperate a feature that is quite heavy weigth currently, and not simply waiting to easily implement it with 4.0 is out.

sindresorhus commented 13 years ago

Agreed, that's the best solution. Any idea when 4.0 will be out?

kimchy commented 13 years ago

No, no due date yet. It seems like the pace is being picked up towards a release, but it will take a few months I think.

sindresorhus commented 13 years ago

Ok, thanks ;) Looking forward to it.

richardsyeo commented 13 years ago

We would very much like this feature too.

naquad commented 13 years ago

Hi.

Are there any news on this? Tired of running around with ASpell :(

j commented 13 years ago

+1

beau-mind commented 13 years ago

We would like to use spellchecker too. Thank you.

conradchu commented 13 years ago

+1

mhluongo commented 13 years ago

+1

tfreitas commented 13 years ago

+1

fbecart commented 13 years ago

+1

alexis779 commented 13 years ago

+1

tfreitas commented 13 years ago

+1

dstendardi commented 13 years ago

+1

adamw commented 12 years ago

+1

juliuss commented 12 years ago

+1

bryangreen commented 12 years ago

+1

abecciu commented 12 years ago

+1

nickdunn commented 12 years ago

Apologies for the +1, but this is way up my wishlist too.

ream88 commented 12 years ago

Yep, me too! +1

sebastianseilund commented 12 years ago

+1 This would be an awesome feature, for an already awesome product! Thank you so much :)

ghost commented 12 years ago

+1

plentz commented 12 years ago

+1

gmatthew commented 12 years ago

+1

krmcbride commented 12 years ago

+1

j commented 12 years ago

ping @kimchy It's been almost a year! :) Any status on this? Tonnnnns of +1's up in here!

mhluongo commented 12 years ago

Guys, I think @kimchy gets it... we all want this. However, Lucene 4.0 hasn't been released yet, and last update from him mentioned that that release would make this feature much easier. Maybe we should be pressuring the Lucene team to hurry up? There's been talk of a 4.0 release forever.

j commented 12 years ago

@mhluongo, it is understood that it's a better "Lucene 4.0" feature, but there seems to be other options in relation to spell checking, etc. for example, #646. A lot of open source softwares don't wait over a year for a feature that the community wants.... a bridge could be made for searching, and when Lucene supports it directly, it can be BC to a temporary/secondary solution (ie. hunspell). i.e. Symfony2 PHP framework builds functionalities for PHP4.0 to get the minor optimization, but has a backup strategy for php versions of 3.x.

My two cents is that this is a huge feature in memory based searching... and would def. set elasticsearch apart from anything else out there right now.

Just my two cents IMO. :)

mhluongo commented 12 years ago

@jstout24 I know that waiting for Lucene 4 is just the path of least resistance, but there are a ton of other awesome features that we could use, as well, and that could be written/maintained in the time saved. At some point one of these +1's needs to start coding themselves if we want this feature, or be okay with waiting (I'm guilty of this too, obviously).

Just trying to be understanding of an embattled OSS developer :)

bradbeattie commented 12 years ago

To people "+1"ing, take a look over here: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12314025. That's the progress of Lucene 4.0.

kimchy commented 12 years ago

Heya fellows, understood, this feature is highly important. The only thing that can be done currently (aside from other ways of solving it like using custom built index using ngrams and the like) is to possibly write a plugin (and probably new extensions points) to the current Lucene spell checking behavior. But, its not really good... (as I explained in my first comment here).

DeeJayPee commented 12 years ago

Hello,

Sorry but i have to +1 this issue too ^^ But now that lucene 4.0 is out, is it possible in any way or do we need an implementation in es ?

Regards,

brusic commented 12 years ago

Lucene 4.0 is not out, only the beta. Final release probably will not happen until October.

louman commented 12 years ago

+1

elfassy commented 12 years ago

+1

Fibonacci-Solucoes-Ageis commented 12 years ago

+1

Tiduin commented 12 years ago

+1

tfreitas commented 12 years ago

+1

schmurfy commented 12 years ago

I think we all know now that many people are interested in this feature, can we stop with the +1 please ? They serve little to no purpose and spam anyone who is watching this thread for real informations.

kul commented 12 years ago

4.0 is Out! :)

herlambang commented 12 years ago

+1

brusic commented 12 years ago

I agree with schmurfy, enough with +1s. If you want to subscribe to this issue, you can change your notification settings below. Look for the dropdown that says "Not watching thread" and change it to "Watch".

Shay commented on spellchecking and Lucene 4.0 last week. In case you missed it, here is the thread: https://groups.google.com/d/topic/elasticsearch/p2mu0Tv3VPI/discussion

"The plan is the first get Lucene 4.0 integrated with elasticsearch, and then expose all the new features. We will take it feature by feature, but to your points, there will be a spellcheck builtin using the new "direct" spellcheck feature, you will be able to configure codecs in the mapping, and write a plugin that introduces new codes, and so on..."

tfreitas commented 11 years ago

+1

brunobowden commented 11 years ago

+1 I'd particularly like to use it when it's deployed on StackOverflow