Closed juanmirocks closed 9 years ago
Converting a corpus with GIMLI is giving me a:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at com.aliasi.chunk.ChunkingImpl.<init>(ChunkingImpl.java:73)
at com.aliasi.dict.ExactDictionaryChunker.chunk(ExactDictionaryChunker.java:275)
at com.aliasi.dict.ExactDictionaryChunker.chunk(ExactDictionaryChunker.java:251)
at pt.ua.tm.gimli.dictionary.DictionaryMatcher.isStopword(DictionaryMatcher.java:146)
at pt.ua.tm.gimli.dictionary.DictionaryMatcher.loadDictionaryChunker(DictionaryMatcher.java:121)
at pt.ua.tm.gimli.dictionary.DictionaryMatcher.<init>(DictionaryMatcher.java:170)
at pt.ua.tm.gimli.reader.JNLPBAReader.read(JNLPBAReader.java:142)
at pt.ua.tm.gimli.reader.JNLPBAReader.main(JNLPBAReader.java:425)
I am currently giving 1024m of heap space on the cluster, and yet I get an out of memory error. Any ideas as to how I can rectify this?
1024m = 1G is not that much either. Start with >4G
[<RNA> Identification Performance]
# of OBJECTs: 118, ANSWERs: 135.
# (recall / precision / f-score) of ...
FULLY CORRECT answer with class info: 65 (0.5508 / 0.4815 / 0.5138),
correct LEFT boundary with class info: 74 (0.6271 / 0.5481 / 0.5850),
correct RIGHT boundary with class info: 81 (0.6864 / 0.6000 / 0.6403),
GNormPlus is already trained on their own corpus. It doesn't permit training on other corpora. Moreover, it distinguishes between gene names, gene families and protein domains.
The download itself is about 2.3GB. I can try it out, but okay to download?
Updated RNA NER:
[<RNA> Identification Performance]
# of OBJECTs: 118, ANSWERs: 115.
# (recall / precision / f-score) of ...
FULLY CORRECT answer with class info: 81 (0.6864 / 0.7043 / 0.6953),
correct LEFT boundary with class info: 84 (0.7119 / 0.7304 / 0.7210),
correct RIGHT boundary with class info: 89 (0.7542 / 0.7739 / 0.7639),