Closed patrickzurek closed 7 years ago
JIRA Comment by user: rcook JIRA Timestamp: 2011-02-18 04:25 PM
Comment body:
The apostrophe has the same problems as the diacritics: you do “Cats cradle� and you get hardly anything, no Vonnegut. But if you do type in the apostrophe, you get it.
JIRA Comment by user: rcook JIRA Timestamp: 2011-03-22 05:53 PM
Comment body:
Might have to do with Solr stemming options. Perhaps explore on/off in solr index screens.
JIRA Comment by user: rcook JIRA Timestamp: 2011-03-22 07:35 PM
Comment body:
Explore this.
solr.StandardFilterFactory
Creates org.apache.lucene.analysis.standard.StandardFilter.
Removes dots from acronyms and 's from the end of tokens. Works only on typed tokens, i.e., those produced by StandardTokenizer or equivalent.
*
Example of StandardTokenizer followed by StandardFilter:
o
"I.B.M. cat's can't" ==> "IBM", "cat", "can't"
JIRA Comment by user: pkiraly JIRA Timestamp: 2011-03-22 08:31 PM
Comment body:
I guess we have to collect a lots of examples, because each Solr filter (like the StandardFilterFactory) has pros and cons. I will extract some example from Lucene and Solr books, upon which we can decide how we would ike the search should works. In Solr admin interface there is a tool, where we can try different test sentences and we can see the results.
JIRA Comment by user: pkiraly JIRA Timestamp: 2011-07-26 02:27 PM
Comment body:
"cat's" finds "cat" "I.B.M. finds "i B.M", "I.B.M.", "i.B., M.", "I---- B----, M", but unfortunatelly not "IBM"
JIRA Comment by user: rcook JIRA Timestamp: 2011-09-06 03:57 PM
Comment body:
This is still problematic. I thought use of solr features like stemming address these sort of issues.
If there is no quick solution, I suggest this move to next milestone 4. Jennifer and Dave, what do you think?
JIRA Comment by user: jbowen JIRA Timestamp: 2011-09-06 04:16 PM
Comment body:
If this isn't easily solvable using solr, is it something that could be solved using a "did you mean" technology? Should we include this with that work?
JIRA Comment by user: rcook JIRA Timestamp: 2011-09-14 02:19 PM
Comment body:
We are going to try setting our solr settings as close to VuFind as possible. They use an older solr so not everything can be done. We are going to turn on SnowballPorter stemming. A good page I found is here.
http://snowball.tartarus.org/algorithms/english/stemmer.html
JIRA Comment by user: pkiraly JIRA Timestamp: 2011-09-15 02:54 PM
Comment body:
I have changed the Solr schema and ran reindexing. I run into a trouble so several thousands record were not indexed, but the bulk of the records were indexed correctly, so the site is ready to be tested.
I will open another issue for the above mentioned error.
Issue resolved: 2011-09-15 02:54 PM
JIRA issue created by: rcook Originally opened: 2011-02-18 04:24 PM
Issue body: (nt)