aymkam / lucene-gosen

Automatically exported from code.google.com/p/lucene-gosen
GNU Lesser General Public License v2.1
0 stars 0 forks source link

compiled dictionary should be able to be deployed out of jar #16

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Users who doesn't use custom dictionaries prefer one-package-jar (dictionary 
included) because it is easy to deploy. But sometimes uses who use custom 
dictionaries prefer separated jar, that is, jar doesn't include compiled 
dictionaries but them are deployed somewhere in a storage.

Once we implement latter, we'd like to consider how we can switch dictionaries 
without reloading jar.

Original issue reported on code.google.com by k...@r.email.ne.jp on 11 Jul 2011 at 8:04

GoogleCodeExporter commented 8 years ago
Sekiguchi-san.

I implemented it and attach patch file.
Test cods does not include it, but makes it immediately.

The changes are as follows.
1. Add ant task "nodic-jar".
2. Add JapaneseTokenizer Constructor argument(Directory of the dicthinary).
3. Add JapaneseTokenizerFactory argument(dictionaryDir).
4. Reading dictionary from the specified directory

If "dictionaryDir" param is null or "", SenFactory read dictionary from jar.

Original comment by johtani on 22 Aug 2011 at 7:55

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Ohtani-san, Sekiguchi-san

I changed to be able to set a relative path from configDir in the above code.
This patch is for current trunk.
This test requires following jar files.

- slf4j-api-1.6.1.jar
- slf4j-jdk14-1.6.1.jar

dictionaryDir can set absolute or relative path.

- $configDir/$dictionaryDir (if dictionaryDir is not absolute)
- $CWD/$dictionaryDir
- $dictionaryDir (if dictionaryDir is absolute)
- from .jar (if dictionaryDir is null or "")

Original comment by shinobu....@gmail.com on 25 Aug 2011 at 11:25

Attachments:

GoogleCodeExporter commented 8 years ago
This is a good feature!

Maybe if we change SenFactory.getInstance to use a ConcurrentHashMap then you 
can easily use multiple dictionaries at the same time?

Original comment by rcm...@gmail.com on 25 Aug 2011 at 12:43

GoogleCodeExporter commented 8 years ago
That's right!

I change SenFactory to use a ConcurrentHashMap. 
I easily tested multiple dictionaries(ipadic and naist-chasen),probably is 
right.

Sorry, test codes does not include it, but makes it immediately.

Original comment by johtani on 29 Aug 2011 at 10:16

Attachments:

GoogleCodeExporter commented 8 years ago
Add test case and change JapaneseAnalyzer constructor.
And change build.xml for multi-dictionary test.

Original comment by johtani on 5 Sep 2011 at 11:07

Attachments:

GoogleCodeExporter commented 8 years ago
Add re-build-dic target to build.xml.
Add schema.xml.snipet sample.

Original comment by johtani on 21 Oct 2011 at 9:29

Attachments:

GoogleCodeExporter commented 8 years ago
This is an important feature, I am not sure that I am the best to review the 
patch but I would propose we consider committing it and issuing a new release, 
we have several new fixes in trunk already in addition to this?

I think its a big limitation that you must currently choose only one dictionary 
for your entire lucene/solr instance.

Original comment by rcm...@gmail.com on 25 Oct 2011 at 1:47

GoogleCodeExporter commented 8 years ago
I've reviewed the patch just yesterday and input some comments to ohtani-san. I 
think he will attach the updated patch soon. Once the new one attached, I'll 
check it and if no problems, he would commit in trunk.

Original comment by k...@r.email.ne.jp on 25 Oct 2011 at 1:55

GoogleCodeExporter commented 8 years ago
OK good, sounds like you have everything under control! Thank you!

After we commit this one I still suggest a release, though maybe first we 
should address issue #17 
(http://code.google.com/p/lucene-gosen/issues/detail?id=17) though, so that it 
officially supports lucene/solr 3.4. But this is not as important.

Original comment by rcm...@gmail.com on 25 Oct 2011 at 2:04

GoogleCodeExporter commented 8 years ago
Attach updated patch.

Update build.xml (add a few comment)
Update some TestCase (assertEquals -> assertSame)
Add o.a.s.analysis.TestJapaneseTokenizerFactory.java

Please final check!

Original comment by johtani on 25 Oct 2011 at 3:07

Attachments:

GoogleCodeExporter commented 8 years ago
Looks good to me.

Original comment by k...@r.email.ne.jp on 25 Oct 2011 at 11:04

GoogleCodeExporter commented 8 years ago
Thanks.
committed revision 142

Original comment by johtani on 25 Oct 2011 at 11:25

GoogleCodeExporter commented 8 years ago

Original comment by johtani on 26 Oct 2011 at 12:20