ashishbaghudana / mthesis-ashish

MIT License
0 stars 1 forks source link

RNA NER #8

Closed juanmirocks closed 9 years ago

juanmirocks commented 9 years ago
ashishbaghudana commented 9 years ago

Converting a corpus with GIMLI is giving me a:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:2694)
    at java.lang.String.<init>(String.java:203)
    at com.aliasi.chunk.ChunkingImpl.<init>(ChunkingImpl.java:73)
    at com.aliasi.dict.ExactDictionaryChunker.chunk(ExactDictionaryChunker.java:275)
    at com.aliasi.dict.ExactDictionaryChunker.chunk(ExactDictionaryChunker.java:251)
    at pt.ua.tm.gimli.dictionary.DictionaryMatcher.isStopword(DictionaryMatcher.java:146)
    at pt.ua.tm.gimli.dictionary.DictionaryMatcher.loadDictionaryChunker(DictionaryMatcher.java:121)
    at pt.ua.tm.gimli.dictionary.DictionaryMatcher.<init>(DictionaryMatcher.java:170)
    at pt.ua.tm.gimli.reader.JNLPBAReader.read(JNLPBAReader.java:142)
    at pt.ua.tm.gimli.reader.JNLPBAReader.main(JNLPBAReader.java:425)

I am currently giving 1024m of heap space on the cluster, and yet I get an out of memory error. Any ideas as to how I can rectify this?

juanmirocks commented 9 years ago

1024m = 1G is not that much either. Start with >4G

ashishbaghudana commented 9 years ago
[<RNA> Identification Performance]
# of OBJECTs: 118,   ANSWERs: 135.

# (recall / precision / f-score) of ...
FULLY CORRECT answer with class info: 65 (0.5508 / 0.4815 / 0.5138),
correct LEFT boundary with class info: 74 (0.6271 / 0.5481 / 0.5850),
correct RIGHT boundary with class info: 81 (0.6864 / 0.6000 / 0.6403),
ashishbaghudana commented 9 years ago

GNormPlus is already trained on their own corpus. It doesn't permit training on other corpora. Moreover, it distinguishes between gene names, gene families and protein domains.

The download itself is about 2.3GB. I can try it out, but okay to download?

ashishbaghudana commented 9 years ago

Updated RNA NER:

 [<RNA> Identification Performance]
 # of OBJECTs: 118,  ANSWERs: 115.

 # (recall / precision / f-score) of ...
 FULLY CORRECT answer with class info: 81 (0.6864 / 0.7043 / 0.6953),
 correct LEFT boundary with class info: 84 (0.7119 / 0.7304 / 0.7210),
 correct RIGHT boundary with class info: 89 (0.7542 / 0.7739 / 0.7639),