Machine-Learning-for-Medical-Language / ctakes-covid-container

Apache License 2.0
5 stars 2 forks source link

Docker GC overhead limit reached on start #1

Open comorbidity opened 2 years ago

comorbidity commented 2 years ago

From localmachine with 32GB memory, Mac M1 $ uname -a Darwin 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 arm64

`23 Sep 2022 20:08:11 INFO UmlsUserApprover - UMLS Account has been validated

23 Sep 2022 20:08:11 INFO JdbcConnectionFactory - Connecting to jdbc:hsqldb:file:org/apache/ctakes/dictionary/lookup/fast/snorx_2021aa/snorx_2021aa: 23 Sep 2022 20:08:11 INFO ENGINE - open start - state not modified ..............................................23 Sep 2022 20:08:26 FATAL ENGINE - readExistingData failed 849823 java.lang.OutOfMemoryError: GC overhead limit exceeded at org.hsqldb.RowAVL.setNewNodes(Unknown Source) at org.hsqldb.RowAVL.(Unknown Source) at org.hsqldb.persist.RowStoreAVLMemory.getNewCachedObject(Unknown Source) at org.hsqldb.Table.insertData(Unknown Source) at org.hsqldb.Table.insertFromScript(Unknown Source) at org.hsqldb.scriptio.ScriptReaderText.readExistingData(Unknown Source) at org.hsqldb.scriptio.ScriptReaderBase.readAll(Unknown Source) at org.hsqldb.persist.Log.processScript(Unknown Source) at org.hsqldb.persist.Log.open(Unknown Source) at org.hsqldb.persist.Logger.open(Unknown Source) at org.hsqldb.Database.reopen(Unknown Source) at org.hsqldb.Database.open(Unknown Source) at org.hsqldb.DatabaseManager.getDatabase(Unknown Source) at org.hsqldb.DatabaseManager.newSession(Unknown Source) at org.hsqldb.jdbc.JDBCConnection.(Unknown Source) at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source) at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:85) at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.(JdbcRareWordDictionary.java:91) at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.(JdbcRareWordDictionary.java:72) at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.(UmlsJdbcRareWordDictionary.java:31) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:195) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:155) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:127) at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:137) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:267) .23 Sep 2022 20:08:26 WARN ENGINE - Script processing failure org.hsqldb.HsqlException: error in script file line: 849823 java.lang.OutOfMemoryError: GC overhead limit exceeded `

tmills commented 2 years ago

Just to rule out the obvious, do you have docker memory limits configured very low? And is this the first time running or have you had success with it before?

On Fri, Sep 23, 2022, 4:11 PM AndyMc @.***> wrote:

From localmachine with 32GB memory, Mac M1 $ uname -a Darwin 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 arm64

`23 Sep 2022 20:08:11 INFO UmlsUserApprover - UMLS Account has been validated

23 Sep 2022 20:08:11 INFO JdbcConnectionFactory - Connecting to jdbc:hsqldb:file:org/apache/ctakes/dictionary/lookup/fast/snorx_2021aa/snorx_2021aa: 23 Sep 2022 20:08:11 INFO ENGINE - open start - state not modified ..............................................23 Sep 2022 20:08:26 FATAL ENGINE - readExistingData failed 849823 java.lang.OutOfMemoryError: GC overhead limit exceeded at org.hsqldb.RowAVL.setNewNodes(Unknown Source) at org.hsqldb.RowAVL.(Unknown Source) at org.hsqldb.persist.RowStoreAVLMemory.getNewCachedObject(Unknown Source) at org.hsqldb.Table.insertData(Unknown Source) at org.hsqldb.Table.insertFromScript(Unknown Source) at org.hsqldb.scriptio.ScriptReaderText.readExistingData(Unknown Source) at org.hsqldb.scriptio.ScriptReaderBase.readAll(Unknown Source) at org.hsqldb.persist.Log.processScript(Unknown Source) at org.hsqldb.persist.Log.open(Unknown Source) at org.hsqldb.persist.Logger.open(Unknown Source) at org.hsqldb.Database.reopen(Unknown Source) at org.hsqldb.Database.open(Unknown Source) at org.hsqldb.DatabaseManager.getDatabase(Unknown Source) at org.hsqldb.DatabaseManager.newSession(Unknown Source) at org.hsqldb.jdbc.JDBCConnection.(Unknown Source) at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source) at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:85) at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.(JdbcRareWordDictionary.java:91) at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.(JdbcRareWordDictionary.java:72) at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.(UmlsJdbcRareWordDictionary.java:31) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:195) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:155) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:127) at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:137) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:267) .23 Sep 2022 20:08:26 WARN ENGINE - Script processing failure org.hsqldb.HsqlException: error in script file line: 849823 java.lang.OutOfMemoryError: GC overhead limit exceeded `

— Reply to this email directly, view it on GitHub https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXODLPEN7VFQFNG72TPITV7YFHZANCNFSM6AAAAAAQUI2Z4E . You are receiving this because you are subscribed to this thread.Message ID: <Machine-Learning-for-Medical-Language/ctakes-covid-container/issues/1 @github.com>

comorbidity commented 2 years ago

Problem is Java on Mac M1 (ARM) is a problem for this container. Only solution is to use a different JVM, which is either a lot of work or a tiny fix.

mikix commented 1 year ago

This sounds like a different error than I remember you hitting on the M1. Do we get further along these days, or was this always the issue?

I had a pet theory about the M1 issue we had been hitting, but if it really is a memory limit issue, I don't think my theory is correct. But here it is anyway:

It might be worth trying to run an amd64 x86 container on the M1 rather than an arm64 container. If you build the docker image manually, you'd get the native arm64 by default, and maybe its ancient JVM has issues on the M1. But if you just try the smart-on-fhir/ctakes-covid container, which only comes in an amd64 variant, the x86 emulation layer might be better than an ancient JVM trying native arm code but not expecting the new M1 chip.

Anyway, that's something to try: run an x86 container by installing smart-on-fhir/ctakes-covid from docker hub and seeing if that helps or hurts things.

comorbidity commented 1 year ago

From @mikix

OK for the M1 here's some guesses based on some detective work:

Docker on M1 will prefer the native architecture (arm64 or in docker terms linux/arm64/v8 for linux images) And surprisingly, the openjdk used in our current cTAKES builds does support that architecture!

But it's 3 years old, and maybe there's a compatibility issue with M1. I see some JDK vendors talk about especially adding support for M1 in their JDK, so maybe there's more to it than simply building for linux/arm64/v8 But docker doesn't know there's a compatibility issue. It sees your arm64 architecture, and grabs the linux/arm64/v8 jdk and builds from that. But! It looks like there is a workaround. Just tell docker to use amd64 anyway (M1 can apparently run amd64 code/dockers in emulation mode, but it's slower) So you have two options. Build like so: docker build --platform linux/amd64 -t ctakes-covid ... (note the --platform linux/amd64 argument to force that version)

Or just try pulling down the image that Jamie made, which doesn't even offer an arm64 version: docker pull smartonfhir/ctakes-covid and that should result in an amd64 version that you can run with docker run smartonfhir/ctakes-covid ..., albeit slowly Not sure my detective work is right, but that might be the shape of it