Closed asfimport closed 12 years ago
Robert Muir (@rmuir) (migrated from JIRA)
Here's a quick fix: just using replace() instead of replaceAll() and using 1GB -Xmx instead of 512MB.
now it builds correctly on java 5. Using 1GB is not ideal but I think necessary if you are using a 64 bit java 5 like me?
We could later try to optimize the dictionary construction to use less RAM so we can lower this (I have some ideas)
Robert Muir (@rmuir) (migrated from JIRA)
With the patch:
[java] building tokeninfo dict...
[java] parse...
[java] sort...
[java] encode...
[java] 53645 nodes, 253185 arcs, 1954817 bytes... done
[java] done
[java] building unknown word dict...done
[java] building connection costs...done
BUILD SUCCESSFUL
Total time: 10 seconds
Robert Muir (@rmuir) (migrated from JIRA)
updated patch, just optimizing the CSV stuff to make less garbage.
I will commit this soon (bumping to Xmx756m in case someone uses java5)
Note: This only affects you if you use java 5 on 3.x, and it only affects you if you want to download/rebuild the dictionary. the analyzer itself works fine on 3.x with java 5.
With java 6, building a kuromoji dictionary is quite fast:
However, if you use java 5, it takes forever and eventually runs out of memory in the CSV parsing phase. So we might need to optimize the CSV parser (like precompile its patterns).
Migrated from LUCENE-3696 by Robert Muir (@rmuir), resolved Jan 16 2012 Attachments: LUCENE-3696.patch (versions: 2)