eXtensibleCatalog / Drupal-Toolkit

The eXtensible Catalog Drupal Toolkit
0 stars 0 forks source link

Use of apostraphe in search - omissions eliminates too much #391

Closed patrickzurek closed 7 years ago

patrickzurek commented 7 years ago

JIRA issue created by: rcook Originally opened: 2011-02-18 04:24 PM

Issue body: (nt)

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-02-18 04:25 PM

Comment body:

The apostrophe has the same problems as the diacritics: you do “Cats cradle� and you get hardly anything, no Vonnegut. But if you do type in the apostrophe, you get it.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-03-22 05:53 PM

Comment body:

Might have to do with Solr stemming options. Perhaps explore on/off in solr index screens.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-03-22 07:35 PM

Comment body:

Explore this.

solr.StandardFilterFactory

Creates org.apache.lucene.analysis.standard.StandardFilter.

Removes dots from acronyms and 's from the end of tokens. Works only on typed tokens, i.e., those produced by StandardTokenizer or equivalent.

*

  Example of StandardTokenizer followed by StandardFilter:
      o

        "I.B.M. cat's can't" ==> "IBM", "cat", "can't"
patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-03-22 08:31 PM

Comment body:

I guess we have to collect a lots of examples, because each Solr filter (like the StandardFilterFactory) has pros and cons. I will extract some example from Lucene and Solr books, upon which we can decide how we would ike the search should works. In Solr admin interface there is a tool, where we can try different test sentences and we can see the results.

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-07-26 02:27 PM

Comment body:

"cat's" finds "cat" "I.B.M. finds "i B.M", "I.B.M.", "i.B., M.", "I---- B----, M", but unfortunatelly not "IBM"

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-09-06 03:57 PM

Comment body:

This is still problematic. I thought use of solr features like stemming address these sort of issues.

If there is no quick solution, I suggest this move to next milestone 4. Jennifer and Dave, what do you think?

patrickzurek commented 7 years ago

JIRA Comment by user: jbowen JIRA Timestamp: 2011-09-06 04:16 PM

Comment body:

If this isn't easily solvable using solr, is it something that could be solved using a "did you mean" technology? Should we include this with that work?

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-09-14 02:19 PM

Comment body:

We are going to try setting our solr settings as close to VuFind as possible. They use an older solr so not everything can be done. We are going to turn on SnowballPorter stemming. A good page I found is here.

http://snowball.tartarus.org/algorithms/english/stemmer.html

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-09-15 02:54 PM

Comment body:

I have changed the Solr schema and ran reindexing. I run into a trouble so several thousands record were not indexed, but the bulk of the records were indexed correctly, so the site is ready to be tested.

I will open another issue for the above mentioned error.

patrickzurek commented 7 years ago

Issue resolved: 2011-09-15 02:54 PM