datatonic / duke

Automatically exported from code.google.com/p/duke
0 stars 0 forks source link

Support looking up records even when IDs are URIs #83

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
For some reason the Lucene analyzers destroy URIs so that looking up records 
that have URIs as identifiers does not work.

Original issue reported on code.google.com by lar...@gmail.com on 14 Nov 2012 at 9:19

GoogleCodeExporter commented 8 years ago
I have tried modifying the code so that instead of stripping special characters 
before searching, it escapes them. However, it still doesn't work, because the 
StandardAnalyzer strips the slashes. Even escaping the slashes doesn't help. 
However, the KeywordAnalyzer works fine.

I need to study the StandardAnalyzer more carefully to find out why this 
happens and what is really going on.

Original comment by lar...@gmail.com on 17 Nov 2012 at 1:56

GoogleCodeExporter commented 8 years ago
Ok, the problem seems to be the StandardTokenizer, which chops up tokens at 
punctuation, like slashes.

Original comment by lar...@gmail.com on 17 Nov 2012 at 2:01

GoogleCodeExporter commented 8 years ago
Looks like we have a solution now, using StandardAnalyzer to index and 
KeywordAnalyzer to search. Will do more testing tomorrow to verify that 
everything is correct.

Original comment by lar...@gmail.com on 23 Nov 2012 at 6:45

GoogleCodeExporter commented 8 years ago
Fixed!

Original comment by lar...@gmail.com on 24 Nov 2012 at 10:04