Closed zifeishan closed 10 years ago
Seems data too large? I limit the input to 50 in sentence extraction, and it worked. I got the same error when using the full data.
DEBUG Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Process runs out of memory when loading the classifiers. Guess you'll need to make the heap space (-Xmx) larger in the run script for the NLP extractor (not the run script in the application). Wasn't necessary for me...
Doesn't work for me...
On Feb 5, 2014, at 8:48 PM, Feiran Wang notifications@github.com wrote:
Seems data too large? I limit the input to 50 in sentence extraction, and it worked.
— Reply to this email directly or view it on GitHub.
Zifei Shan M.S. student in Computer Science, Stanford University (2015) B.S. in Computer Science, Peking University (2013)
Java version? it sounds weird, but older java does use more memory...
Ce
On Wed, Feb 5, 2014 at 8:53 PM, Denny Britz notifications@github.comwrote:
DEBUG Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Process runs out of memory when loading the classifiers. Guess you'll need to make the heap space (-Xmx) larger in the run script for the NLP extractor (not the run script in the application). Wasn't necessary for me...
Reply to this email directly or view it on GitHubhttps://github.com/dennybritz/deepdive/issues/68#issuecomment-34292367 .
How much memory do you have on your machine? By default Java uses some fraction of the total available memory. I have 8Gb, if you have less maybe that's why it wasn't necessary for me...
I think so, but I have no idea where to specify this -Xmx argument... There's no explicit call to Java in run.sh.
$ ls udf/nlp_extractor/
README.md build.sbt project run.sh src target
On Feb 5, 2014, at 8:53 PM, Denny Britz notifications@github.com wrote:
DEBUG Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... Exception in thread "main" java.lang.OutOfMemoryError: Java heap space Process runs out of memory when loading the classifiers. Guess you'll need to make the heap space (-Xmx) larger in the run script for the NLP extractor (not the run script in the application). Wasn't necessary for me...
— Reply to this email directly or view it on GitHub.
Zifei Shan M.S. student in Computer Science, Stanford University (2015) B.S. in Computer Science, Peking University (2013)
I have 4GB on my machine. Feiran, how about you?
On Feb 5, 2014, at 8:57 PM, Denny Britz notifications@github.com wrote:
How much memory do you have on your machine? By default Java uses some fraction of the total available memory. I have 8Gb, if you have less maybe that's why it wasn't necessary for me...
— Reply to this email directly or view it on GitHub.
Zifei Shan M.S. student in Computer Science, Stanford University (2015) B.S. in Computer Science, Peking University (2013)
export JAVA_OPTS="-Xmx4g"
Maybe check how low you can go, e.g. 2g or 1g, and see which one still works..
@zifeishan I only got 2g... did you remove the tables and pipelines before re-running nlp?
@zifeishan What if you set the limit to 10...
I did, but the problem is that I just cannot start the nlp_extractor.
It works for me when I do export JAVA_OPTS="-Xmx4g"
in run.sh
.
I'll check how low it can go...
On Feb 5, 2014, at 9:01 PM, Feiran Wang notifications@github.com wrote:
@zifeishan I only got 2g... did you remove the tables and pipelines before re-running nlp?
— Reply to this email directly or view it on GitHub.
Zifei Shan M.S. student in Computer Science, Stanford University (2015) B.S. in Computer Science, Peking University (2013)
The parameter -Xmx4g works for 10 documents limit, but errors occur when doing 50 documents.
Full log:
spouse_example (master) $ ./run.sh
[info] Loading project definition from /Users/Robin/Documents/repos/research/deepdive/project
[info] Set current project to deepdive (in build file:/Users/Robin/Documents/repos/research/deepdive/)
[info] Running org.deepdive.Main -c /Users/Robin/Documents/repos/research/deepdive/app/spouse_example/application.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/Robin/Documents/repos/research/deepdive/lib/sampler-assembly-0.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/Robin/.ivy2/cache/ch.qos.logback/logback-classic/jars/logback-classic-1.0.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
21:09:22.047 [][][Slf4jLogger] INFO Slf4jLogger started
21:09:22.072 [run-main-0][EventStream(akka://deepdive)][EventStream] DEBUG logger log1-Slf4jLogger started
21:09:22.074 [run-main-0][EventStream(akka://deepdive)][EventStream] DEBUG Default Loggers started
21:09:22.088 [run-main-0][Main$(akka://deepdive)][Main$] INFO Running pipeline with configuration from /Users/Robin/Documents/repos/research/deepdive/app/spouse_example/application.conf
21:09:22.214 [run-main-0][JdbcDataStore$(akka://deepdive)][JdbcDataStore$] INFO Intializing all JDBC data stores
21:09:22.390 [][][ConnectionPool$] DEBUG Registered connection pool : ConnectionPool(url:jdbc:postgresql://127.0.0.1/deepdive_spouse, user:Robin)
21:09:22.408 [default-dispatcher-4][taskManager][TaskManager] INFO starting at akka://deepdive/user/taskManager
21:09:22.408 [default-dispatcher-2][profiler][Profiler] INFO starting at akka://deepdive/user/profiler
21:09:22.428 [default-dispatcher-4][inferenceManager][InferenceManager$PostgresInferenceManager] INFO Starting
21:09:22.452 [default-dispatcher-2][extractionManager][ExtractionManager$PostgresExtractionManager] INFO starting
21:09:22.454 [default-dispatcher-3][factorGraphBuilder][FactorGraphBuilder$PostgresFactorGraphBuilder] INFO Starting
21:09:22.475 [run-main-0][DeepDive$(akka://deepdive)][DeepDive$] INFO Running pipeline=_default with tasks=List(ext_people, ext_sentences, inference, calibration, report, shutdown)
21:09:22.481 [default-dispatcher-3][taskManager][TaskManager] INFO Added task_id=ext_people
21:09:22.486 [default-dispatcher-3][taskManager][TaskManager] INFO 0/1 tasks eligible.
21:09:22.488 [default-dispatcher-3][taskManager][TaskManager] INFO Tasks not_eligible: Set(ext_people)
21:09:22.491 [default-dispatcher-3][taskManager][TaskManager] INFO Added task_id=ext_sentences
21:09:22.492 [default-dispatcher-3][taskManager][TaskManager] INFO 1/2 tasks eligible.
21:09:22.493 [default-dispatcher-3][taskManager][TaskManager] INFO Tasks not_eligible: Set(ext_people)
21:09:22.494 [default-dispatcher-3][taskManager][TaskManager] DEBUG Sending task_id=ext_sentences to Actor[akka://deepdive/user/extractionManager#1773597743]
21:09:22.500 [default-dispatcher-5][extractionManager][ExtractionManager$PostgresExtractionManager] INFO Adding task_name=ext_sentences
21:09:22.503 [default-dispatcher-3][taskManager][TaskManager] INFO Added task_id=inference
21:09:22.504 [default-dispatcher-3][taskManager][TaskManager] INFO 0/2 tasks eligible.
21:09:22.505 [default-dispatcher-3][taskManager][TaskManager] INFO Tasks not_eligible: Set(ext_people, inference)
21:09:22.506 [default-dispatcher-3][taskManager][TaskManager] INFO Added task_id=calibration
21:09:22.507 [default-dispatcher-3][taskManager][TaskManager] INFO 0/3 tasks eligible.
21:09:22.510 [default-dispatcher-3][taskManager][TaskManager] INFO Tasks not_eligible: Set(ext_people, inference, calibration)
21:09:22.512 [default-dispatcher-3][taskManager][TaskManager] INFO Added task_id=report
21:09:22.513 [default-dispatcher-3][taskManager][TaskManager] INFO 0/4 tasks eligible.
21:09:22.514 [default-dispatcher-3][taskManager][TaskManager] INFO Tasks not_eligible: Set(ext_people, inference, report, calibration)
21:09:22.515 [default-dispatcher-3][taskManager][TaskManager] INFO Added task_id=shutdown
21:09:22.516 [default-dispatcher-6][profiler][Profiler] DEBUG starting report_id=ext_sentences
21:09:22.517 [default-dispatcher-3][taskManager][TaskManager] INFO 0/5 tasks eligible.
21:09:22.519 [default-dispatcher-3][taskManager][TaskManager] INFO Tasks not_eligible: Set(calibration, ext_people, inference, shutdown, report)
21:09:22.519 [default-dispatcher-5][extractionManager][ExtractionManager$PostgresExtractionManager] INFO executing extractorName=ext_sentences
21:09:22.573 [][][ConnectionPool$] DEBUG Borrowed a new connection from ConnectionPool(url:jdbc:postgresql://127.0.0.1/deepdive_spouse, user:Robin)
21:09:22.577 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO waiting for task
21:09:22.591 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO Received task=ext_sentences. Executing
21:09:22.593 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO Executing before script.
21:09:22.593 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO Executing: "/Users/Robin/Documents/repos/research/deepdive/app/spouse_example/udf/before_sentences.sh"
21:09:22.666 [Thread-5][extractorRunner-ext_sentences][ExtractorRunner] INFO NOTICE: truncate cascades to table "people_mentions"
21:09:22.667 [Thread-5][extractorRunner-ext_sentences][ExtractorRunner] INFO NOTICE: truncate cascades to table "has_spouse"
21:09:22.681 [Thread-4][extractorRunner-ext_sentences][ExtractorRunner] INFO TRUNCATE TABLE
21:09:22.682 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO Starting 1 children process workers
21:09:22.715 [default-dispatcher-5][processExecutor1][ProcessExecutor] INFO started
21:09:22.718 [default-dispatcher-5][processExecutor1][ProcessExecutor] INFO starting process with cmd="/Users/Robin/Documents/repos/research/deepdive/app/spouse_example/udf/nlp_extractor/run.sh -k articles.id -v articles.text -l 20" and batch_size=50000
21:09:22.742 [default-dispatcher-6][extractorRunner-ext_sentences][ExtractorRunner] INFO Getting data from the data store and sending it to the workers. query='DatastoreInputQuery(SELECT * FROM articles order by id asc limit 50)'
21:09:22.777 [][][ConnectionPool$] DEBUG Borrowed a new connection from ConnectionPool(url:jdbc:postgresql://127.0.0.1/deepdive_spouse, user:Robin)
21:09:23.807 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing with id_key="articles.id" value_key="articles.text" max_len=20 numThreads=4
21:09:23.950 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator tokenize
21:09:23.960 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator cleanxml
21:09:24.017 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator ssplit
21:09:24.022 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator pos
21:09:25.855 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.8 sec].
21:09:25.856 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator lemma
21:09:25.857 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator ner
21:09:31.226 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.3 sec].
21:09:34.537 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [3.3 sec].
21:09:37.938 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [3.4 sec].
21:09:38.185 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
21:09:38.267 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
21:09:39.078 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Feb 5, 2014 9:09:39 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
21:09:39.079 [Thread-8][processExecutor1][ProcessExecutor] DEBUG INFO: Ignoring inactive rule: null
21:09:39.080 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Feb 5, 2014 9:09:39 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
21:09:39.081 [Thread-8][processExecutor1][ProcessExecutor] DEBUG INFO: Ignoring inactive rule: temporal-composite-8:ranges
21:09:39.082 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
21:09:39.090 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Initializing JollyDayHoliday for sutime with classpath:edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml
21:09:39.437 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
21:09:39.471 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
21:09:39.625 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Feb 5, 2014 9:09:39 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
21:09:39.625 [Thread-8][processExecutor1][ProcessExecutor] DEBUG INFO: Ignoring inactive rule: null
21:09:39.626 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Feb 5, 2014 9:09:39 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
21:09:39.626 [Thread-8][processExecutor1][ProcessExecutor] DEBUG INFO: Ignoring inactive rule: temporal-composite-8:ranges
21:09:39.627 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
21:09:39.633 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator parse
21:09:41.041 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1.4 sec].
21:09:41.042 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Adding annotator dcoref
21:09:57.579 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 279...
21:10:06.032 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 345...
21:10:09.309 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 390...
21:10:16.884 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 766...
21:10:22.503 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 839...
21:10:25.429 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1001...
21:10:26.858 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1145...
21:10:30.127 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1316...
21:10:33.188 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1387...
21:10:36.658 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1405...
21:10:39.119 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1462...
21:10:41.326 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1491...
21:10:44.409 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1556...
21:10:48.641 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1686...
21:10:52.618 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1882...
21:10:54.914 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1938...
21:10:57.052 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 1986...
21:10:59.707 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 2131...
21:11:02.265 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 2132...
21:11:06.721 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 2190...
21:11:11.203 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 2293...
21:11:14.589 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 2573...
21:11:17.985 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3410...
21:11:19.028 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3468...
21:11:20.999 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3704...
21:11:24.806 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3720...
21:11:28.044 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3845...
21:11:31.314 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3856...
21:11:35.169 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 3921...
21:11:37.665 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 4093...
21:11:39.243 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 4285...
21:11:41.539 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 4301...
21:11:43.376 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 4369...
21:11:47.494 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 4434...
21:11:48.422 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 4484...
21:11:52.329 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 5602...
21:11:53.996 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 5814...
21:11:55.493 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 6854...
21:11:55.536 [default-dispatcher-6][extractorRunner-ext_sentences][ExtractorRunner] DEBUG all data was sent to workers.
21:11:55.544 [default-dispatcher-6][processExecutor1][ProcessExecutor] DEBUG closing input stream
21:11:57.355 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 7145...
21:11:59.539 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 7706...
21:12:01.223 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 7968...
21:12:05.504 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 8800...
21:12:08.509 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 8819...
21:12:12.793 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 9092...
21:12:18.730 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 9130...
21:12:27.059 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 9170...
21:12:27.282 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 9287...
21:12:32.074 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 9812...
21:12:35.667 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 10161...
21:12:38.906 [Thread-8][processExecutor1][ProcessExecutor] DEBUG Parsing document 10292...
21:12:41.506 [default-dispatcher-5][extractorRunner-ext_sentences][ExtractorRunner] DEBUG adding chunk of size=2288 data store.
21:12:42.390 [default-dispatcher-5][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore(akka://deepdive)][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore] INFO Writing data of to file=/private/var/folders/3p/dw7y60t50s579qmxlsl0rb6r0000gn/T/deepdive_sentences6164545240943149796.csv
21:12:43.539 [default-dispatcher-5][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore(akka://deepdive)][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore] INFO Copying batch data to postgres. sql='COPY sentences(dependencies, document_id, ner_tags, pos_tags, sentence, words) FROM STDIN CSV'file='/private/var/folders/3p/dw7y60t50s579qmxlsl0rb6r0000gn/T/deepdive_sentences6164545240943149796.csv'
21:12:43.540 [][][ConnectionPool$] DEBUG Borrowed a new connection from ConnectionPool(url:jdbc:postgresql://127.0.0.1/deepdive_spouse, user:Robin)
21:12:43.848 [default-dispatcher-5][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore(akka://deepdive)][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore] INFO Successfully copied batch data to postgres.
21:12:43.849 [Thread-7][processExecutor1][ProcessExecutor] DEBUG closing output stream
21:12:43.851 [default-dispatcher-5][processExecutor1][ProcessExecutor] INFO process exited with exit_value=0
21:12:43.865 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] DEBUG worker=processExecutor1 has terminated. Waiting for 0 others.
21:12:43.865 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO All workers are done. Finishing up.
21:12:43.869 [default-dispatcher-8][extractorRunner-ext_sentences][ExtractorRunner] INFO Shutting down
21:12:43.873 [default-dispatcher-4][profiler][Profiler] DEBUG ending report_id=ext_sentences
21:12:43.874 [default-dispatcher-5][taskManager][TaskManager] INFO Completed task_id=ext_sentences with Success(Done!)
21:12:43.874 [default-dispatcher-5][taskManager][TaskManager] INFO 1/5 tasks eligible.
21:12:43.875 [default-dispatcher-5][taskManager][TaskManager] INFO Tasks not_eligible: Set(shutdown, inference, report, calibration)
21:12:43.875 [default-dispatcher-5][taskManager][TaskManager] DEBUG Sending task_id=ext_people to Actor[akka://deepdive/user/extractionManager#1773597743]
21:12:43.876 [default-dispatcher-5][extractionManager][ExtractionManager$PostgresExtractionManager] INFO Adding task_name=ext_people
21:12:43.876 [default-dispatcher-5][extractionManager][ExtractionManager$PostgresExtractionManager] INFO executing extractorName=ext_people
21:12:43.877 [default-dispatcher-5][extractorRunner-ext_people][ExtractorRunner] INFO waiting for task
21:12:43.877 [default-dispatcher-5][extractorRunner-ext_people][ExtractorRunner] INFO Received task=ext_people. Executing
21:12:43.881 [default-dispatcher-5][extractorRunner-ext_people][ExtractorRunner] INFO Executing before script.
21:12:43.885 [default-dispatcher-5][extractorRunner-ext_people][ExtractorRunner] INFO Executing: "/Users/Robin/Documents/repos/research/deepdive/app/spouse_example/udf/before_people.sh"
21:12:43.887 [default-dispatcher-4][profiler][Profiler] DEBUG starting report_id=ext_people
21:12:43.957 [Thread-11][extractorRunner-ext_people][ExtractorRunner] INFO NOTICE: truncate cascades to table "has_spouse"
21:12:43.968 [Thread-10][extractorRunner-ext_people][ExtractorRunner] INFO TRUNCATE TABLE
21:12:43.969 [default-dispatcher-5][extractorRunner-ext_people][ExtractorRunner] INFO Starting 1 children process workers
21:12:43.969 [][][ConnectionPool$] DEBUG Borrowed a new connection from ConnectionPool(url:jdbc:postgresql://127.0.0.1/deepdive_spouse, user:Robin)
21:12:43.970 [default-dispatcher-8][extractorRunner-ext_people][ExtractorRunner] INFO Getting data from the data store and sending it to the workers. query='DatastoreInputQuery(SELECT * FROM sentences)'
21:12:43.970 [default-dispatcher-6][processExecutor1][ProcessExecutor] INFO started
21:12:43.971 [default-dispatcher-6][processExecutor1][ProcessExecutor] INFO starting process with cmd="/Users/Robin/Documents/repos/research/deepdive/app/spouse_example/udf/ext_people.py" and batch_size=50000
21:12:45.074 [Thread-14][processExecutor1][ProcessExecutor] DEBUG Traceback (most recent call last):
21:12:45.075 [Thread-14][processExecutor1][ProcessExecutor] DEBUG File "/Users/Robin/Documents/repos/research/deepdive/app/spouse_example/udf/ext_people.py", line 9, in <module>
21:12:45.075 [Thread-14][processExecutor1][ProcessExecutor] DEBUG sentence_obj = json.loads(row)
21:12:45.076 [Thread-14][processExecutor1][ProcessExecutor] DEBUG File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
21:12:45.076 [Thread-14][processExecutor1][ProcessExecutor] DEBUG return _default_decoder.decode(s)
21:12:45.076 [Thread-14][processExecutor1][ProcessExecutor] DEBUG File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
21:12:45.077 [Thread-14][processExecutor1][ProcessExecutor] DEBUG obj, end = self.raw_decode(s, idx=_w(s, 0).end())
21:12:45.077 [Thread-14][processExecutor1][ProcessExecutor] DEBUG File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
21:12:45.078 [Thread-14][processExecutor1][ProcessExecutor] DEBUG obj, end = self.scan_once(s, idx)
21:12:45.078 [Thread-14][processExecutor1][ProcessExecutor] DEBUG UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 13: invalid continuation byte
21:12:45.081 [default-dispatcher-6][extractorRunner-ext_people][ExtractorRunner] DEBUG adding chunk of size=1195 data store.
21:12:45.081 [default-dispatcher-8][extractorRunner-ext_people][ExtractorRunner] DEBUG all data was sent to workers.
21:12:45.082 [default-dispatcher-5][processExecutor1][ProcessExecutor] DEBUG closing input stream
21:12:45.139 [default-dispatcher-6][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore(akka://deepdive)][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore] INFO Writing data of to file=/private/var/folders/3p/dw7y60t50s579qmxlsl0rb6r0000gn/T/deepdive_people_mentions3678455628136952209.csv
21:12:45.220 [][][ConnectionPool$] DEBUG Borrowed a new connection from ConnectionPool(url:jdbc:postgresql://127.0.0.1/deepdive_spouse, user:Robin)
21:12:45.221 [default-dispatcher-6][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore(akka://deepdive)][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore] INFO Copying batch data to postgres. sql='COPY people_mentions(length, sentence_id, start_position, text) FROM STDIN CSV'file='/private/var/folders/3p/dw7y60t50s579qmxlsl0rb6r0000gn/T/deepdive_people_mentions3678455628136952209.csv'
21:12:45.290 [default-dispatcher-6][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore(akka://deepdive)][PostgresExtractionDataStoreComponent$PostgresExtractionDataStore] INFO Successfully copied batch data to postgres.
21:12:45.291 [Thread-13][processExecutor1][ProcessExecutor] DEBUG closing output stream
21:12:45.291 [default-dispatcher-6][processExecutor1][ProcessExecutor] INFO process exited with exit_value=1
21:12:45.293 [default-dispatcher-2][profiler][Profiler] DEBUG ending report_id=ext_people
21:12:45.293 [default-dispatcher-6][taskManager][TaskManager] INFO Completed task_id=ext_people with Failure(java.lang.RuntimeException: process exited with exit_code=1)
21:12:45.302 [default-dispatcher-6][taskManager][TaskManager] ERROR task=ext_people Failed: java.lang.RuntimeException: process exited with exit_code=1
21:12:45.303 [default-dispatcher-4][extractorRunner-ext_people][LocalActorRef] INFO Message [akka.actor.Terminated] from Actor[akka://deepdive/user/extractionManager/extractorRunner-ext_people/processExecutor1#1602433297] to Actor[akka://deepdive/user/extractionManager/extractorRunner-ext_people#-520850708] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
21:12:45.303 [default-dispatcher-6][taskManager][TaskManager] ERROR Forcing shutdown
21:12:45.308 [default-dispatcher-6][taskManager][TaskManager] ERROR Cancelling task=calibration
21:12:45.309 [default-dispatcher-6][taskManager][TaskManager] ERROR Cancelling task=inference
21:12:45.309 [default-dispatcher-6][taskManager][TaskManager] INFO 1/2 tasks eligible.
21:12:45.310 [default-dispatcher-6][taskManager][TaskManager] INFO Tasks not_eligible: Set(shutdown)
21:12:45.310 [default-dispatcher-6][taskManager][TaskManager] DEBUG Sending task_id=report to Actor[akka://deepdive/user/profiler#205768101]
21:12:45.311 [default-dispatcher-6][profiler][Profiler] DEBUG starting report_id=report
21:12:45.311 [default-dispatcher-6][profiler][Profiler] INFO --------------------------------------------------
21:12:45.312 [default-dispatcher-6][profiler][Profiler] INFO Summary Report
21:12:45.312 [default-dispatcher-6][profiler][Profiler] INFO --------------------------------------------------
21:12:45.313 [default-dispatcher-6][profiler][Profiler] INFO ext_sentences SUCCESS [201373 ms]
21:12:45.314 [default-dispatcher-6][profiler][Profiler] INFO ext_people FAILURE [1406 ms]
21:12:45.315 [default-dispatcher-6][profiler][Profiler] INFO --------------------------------------------------
21:12:45.315 [default-dispatcher-2][profiler][Profiler] DEBUG ending report_id=report
21:12:45.316 [default-dispatcher-4][taskManager][TaskManager] INFO Completed task_id=report with Success(Success(()))
21:12:45.316 [default-dispatcher-4][taskManager][TaskManager] INFO 1/1 tasks eligible.
21:12:45.317 [default-dispatcher-4][taskManager][TaskManager] INFO Tasks not_eligible: Set()
21:12:45.317 [default-dispatcher-4][taskManager][TaskManager] DEBUG Sending task_id=shutdown to Actor[akka://deepdive/user/taskManager#-460907103]
21:12:45.318 [default-dispatcher-6][profiler][Profiler] DEBUG starting report_id=shutdown
21:12:45.342 [default-dispatcher-7][taskManager][RepointableActorRef] INFO Message [akka.dispatch.sysmsg.Terminate] from Actor[akka://deepdive/user/taskManager#-460907103] to Actor[akka://deepdive/user/taskManager#-460907103] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
21:12:45.350 [default-dispatcher-7][EventStream][EventStream] DEBUG shutting down: StandardOutLogger started
[success] Total time: 205 s, completed Feb 5, 2014 9:12:45 PM
On Feb 5, 2014, at 9:03 PM, Zifei Shan zifei@stanford.edu wrote:
I did, but the problem is that I just cannot start the nlp_extractor.
It works for me when I do
export JAVA_OPTS="-Xmx4g"
inrun.sh
.I'll check how low it can go...
On Feb 5, 2014, at 9:01 PM, Feiran Wang notifications@github.com wrote:
@zifeishan I only got 2g... did you remove the tables and pipelines before re-running nlp?
Reply to this email directly or view it on GitHub.
Zifei Shan M.S. student in Computer Science, Stanford University (2015) B.S. in Computer Science, Peking University (2013)
Zifei Shan M.S. student in Computer Science, Stanford University (2015) B.S. in Computer Science, Peking University (2013)
Not sure what happened, closing this for now..
I got "OutOfMemoryError" when using nlp_extractor on my Mac.. Nowhere in the Walkthrough that indicates how to fix this.
Full log: