cegme / gatordsr

University of Florida Trec KBA code and more
3 stars 0 forks source link

Stanford NLP: Unknown Variable #39

Closed mshahriarinia closed 11 years ago

mshahriarinia commented 11 years ago

Stanford NLP prints out Unknown Variable: TWILIGHT or WEEKDAY or MILISECOND, etc. and we need to figure out the origins of this problem along with an immediate solution as basically version 1. Part of the related execution log is as follows:

13/05/05 08:17:58 INFO cise.EmbededFaucet: Fetching, decrypting and decompressing with GrabGPG(null,null)
gpg: encrypted with 1024-bit RSA key, ID 3662FD5E, created 2012-05-30
      "trec-kba (Generated by gnupg.py) <trec@nist.gov>"
13/05/05 08:18:01 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/05/05 08:18:02 INFO storage.BlockManagerMaster: Registered BlockManagerMaster Actor
13/05/05 08:18:02 INFO storage.MemoryStore: MemoryStore started with capacity 11.7 GB.
13/05/05 08:18:02 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20130505081802-9396
13/05/05 08:18:02 INFO network.ConnectionManager: Bound socket to port 27179 with id = ConnectionManagerId(sm321-01,27179)
13/05/05 08:18:02 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/05/05 08:18:02 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager sm321-01:27179 with 11.7 GB RAM
13/05/05 08:18:02 INFO storage.BlockManagerMaster: Registered BlockManager
13/05/05 08:18:02 INFO server.Server: jetty-7.5.3.v20111011
13/05/05 08:18:02 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:48853 STARTING
13/05/05 08:18:02 INFO broadcast.HttpBroadcast: Broadcast server started at http://128.227.170.239:48853
13/05/05 08:18:02 INFO spark.MapOutputTracker: Registered MapOutputTrackerActor actor
13/05/05 08:18:02 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-01b71b52-fd2e-4563-9fcd-eff83643eff1
13/05/05 08:18:02 INFO server.Server: jetty-7.5.3.v20111011
13/05/05 08:18:02 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:11375 STARTING
13/05/05 08:18:02 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
13/05/05 08:18:02 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:14071
13/05/05 08:18:02 INFO storage.BlockManagerUI: Started BlockManager web UI at http://sm321-01:14071
13/05/05 08:18:03 INFO spark.SparkContext: Added JAR target/scala-2.9.2/gatordsr_2.9.2-0.01.jar at http://128.227.170.239:11375/jars/gatordsr_2.9.2-0.01.jar with timestamp 1367756283103
13/05/05 08:18:03 INFO spark.SparkContext: Starting job: foreach at EmbededFaucet.scala:223
13/05/05 08:18:03 INFO scheduler.DAGScheduler: Got job 0 (foreach at EmbededFaucet.scala:223) with 16 output partitions (allowLocal=false)
13/05/05 08:18:03 INFO scheduler.DAGScheduler: Final stage: Stage 0 (filter at EmbededFaucet.scala:213)
13/05/05 08:18:03 INFO scheduler.DAGScheduler: Parents of final stage: List()
13/05/05 08:18:03 INFO scheduler.DAGScheduler: Missing parents: List()
13/05/05 08:18:03 INFO scheduler.DAGScheduler: Submitting Stage 0 (FilteredRDD[3] at filter at EmbededFaucet.scala:213), which has no missing parents
13/05/05 08:18:03 INFO scheduler.DAGScheduler: Submitting 16 missing tasks from Stage 0 (FilteredRDD[3] at filter at EmbededFaucet.scala:213)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 2)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 7)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 3)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 5)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 6)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 4)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 1)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 13)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 15)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 0)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 14)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 12)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 11)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 10)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 8)
13/05/05 08:18:03 INFO local.LocalScheduler: Running ResultTask(0, 9)
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 0 is 5002623 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Fetching http://128.227.170.239:11375/jars/gatordsr_2.9.2-0.01.jar with timestamp 1367756283103
13/05/05 08:18:04 INFO spark.Utils: Fetching http://128.227.170.239:11375/jars/gatordsr_2.9.2-0.01.jar to /tmp/fetchFileTemp7200044855714019777.tmp
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 7 is 4285140 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 13 is 3397306 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 3 is 1935969 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 5 is 3873806 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 2 is 6236580 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Adding file:/tmp/spark-8425910e-268b-4f6f-96e1-cd661c83c772/gatordsr_2.9.2-0.01.jar to class loader
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 10 is 2338225 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 8 is 4191122 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 6 is 3903524 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 14 is 2630517 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 15 is 4604777 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 12 is 3748268 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 4 is 1596137 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 9 is 2819081 bytes
Adding annotator tokenize
Adding annotator ssplit
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 1 is 3549813 bytes
13/05/05 08:18:04 INFO local.LocalScheduler: Size of task 11 is 3078217 bytes
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Loading default properties from tagger edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.5 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [3.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [7.0 sec].
Initialization JollyDayHoliday for sutime
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
May 05, 2013 8:18:22 AM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Ignoring inactive rule: temporal-composite-8:ranges
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
13/05/05 08:18:23 INFO util.RelationChecker: Retrieving the relation file reverb_clueweb_relations-1.1.txt.gz ...
13/05/05 08:18:30 INFO util.RelationChecker: Finished building the wikirelation bloom filter.
Unknown variable: MILLISECOND
13/05/05 08:20:25 INFO local.LocalScheduler: Finished ResultTask(0, 4)
13/05/05 08:20:25 INFO scheduler.DAGScheduler: Completed ResultTask(0, 4)
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
Unknown variable: TWILIGHT
13/05/05 08:21:23 INFO local.LocalScheduler: Finished ResultTask(0, 10)
13/05/05 08:21:23 INFO scheduler.DAGScheduler: Completed ResultTask(0, 10)
Unknown variable: TWILIGHT
13/05/05 08:24:16 INFO local.LocalScheduler: Finished ResultTask(0, 14)
13/05/05 08:24:16 INFO scheduler.DAGScheduler: Completed ResultTask(0, 14)
Unknown variable: TWILIGHT
13/05/05 08:24:47 INFO local.LocalScheduler: Finished ResultTask(0, 11)
13/05/05 08:24:47 INFO scheduler.DAGScheduler: Completed ResultTask(0, 11)
Unknown variable: TWILIGHT
13/05/05 08:24:58 INFO local.LocalScheduler: Finished ResultTask(0, 1)
13/05/05 08:24:58 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
13/05/05 08:25:00 INFO local.LocalScheduler: Finished ResultTask(0, 5)
13/05/05 08:25:00 INFO scheduler.DAGScheduler: Completed ResultTask(0, 5)
13/05/05 08:25:21 INFO local.LocalScheduler: Finished ResultTask(0, 13)
13/05/05 08:25:21 INFO scheduler.DAGScheduler: Completed ResultTask(0, 13)
13/05/05 08:25:23 INFO local.LocalScheduler: Finished ResultTask(0, 3)
13/05/05 08:25:23 INFO scheduler.DAGScheduler: Completed ResultTask(0, 3)
13/05/05 08:26:11 INFO local.LocalScheduler: Finished ResultTask(0, 9)
13/05/05 08:26:11 INFO scheduler.DAGScheduler: Completed ResultTask(0, 9)
13/05/05 08:26:43 INFO local.LocalScheduler: Finished ResultTask(0, 8)
13/05/05 08:26:43 INFO scheduler.DAGScheduler: Completed ResultTask(0, 8)
13/05/05 08:27:49 INFO local.LocalScheduler: Finished ResultTask(0, 12)
13/05/05 08:27:49 INFO scheduler.DAGScheduler: Completed ResultTask(0, 12)
13/05/05 08:33:21 INFO local.LocalScheduler: Finished ResultTask(0, 15)
13/05/05 08:33:21 INFO scheduler.DAGScheduler: Completed ResultTask(0, 15)
13/05/05 08:33:35 INFO local.LocalScheduler: Finished ResultTask(0, 2)
13/05/05 08:33:35 INFO scheduler.DAGScheduler: Completed ResultTask(0, 2)
13/05/05 08:37:51 INFO local.LocalScheduler: Finished ResultTask(0, 6)
13/05/05 08:37:51 INFO scheduler.DAGScheduler: Completed ResultTask(0, 6)
Unknown variable: WEEKDAY
13/05/05 08:40:48 INFO local.LocalScheduler: Finished ResultTask(0, 0)
13/05/05 08:40:48 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
peng51 commented 11 years ago

Could you share the code that you test with the Cached Faucet, Embedded Faucet and Pipeline? I think the unknown variable may not a problem because it seems that it doesn't affect how the system run. It may be because that the Stanford NLP pipeline outputs some information when parsing or coming across some specific words. All the possible unknown variable may be seen at this list, http://grepcode.com/file/repo1.maven.org/maven2/edu.stanford.nlp/stanford-corenlp/1.3.3/edu/stanford/nlp/models/dcoref/inanimate.unigrams.txt?av=f

mshahriarinia commented 11 years ago

I uploaded the latest files that I am working on in pull #41 but it doesn't really matter as you'd end up in this error by calling Stanford NLP on stream items. You can try either run-main edu.ufl.cise.TRECSSF2013 which calls CachedFaucet or call run-main edu.ufl.cise.EmbededFaucet directly.