Constannnnnt / Distributed-CoreNLP

This infrastructure, built on Stanford CoreNLP, MapReduce and Spark with Java, aims at processing documents annotations at large scale.
https://github.com/Constannnnnt/Distributed-CoreNLP
MIT License
0 stars 0 forks source link

Cy ssplit & cleanxml & coref #13

Closed Constannnnnt closed 6 years ago

Constannnnnt commented 6 years ago

result

2018-11-18 23:42:23 INFO  Utils:54 - Fetching spark://ubuntu1604-002.student.cs.uwaterloo.ca:43674/jars/project-1.0.jar to /tmp((0,tokenize),The-1 University-2 of-3 Waterloo-4 is-5 located-6 in-7 Canada-8 .-9 Goose-1 lives-2 in-3 this-4 University-5 .-6)
((0,cleanxml),The-1 University-2 of-3 Waterloo-4 is-5 located-6 in-7 Canada-8 .-9 Goose-1 lives-2 in-3 this-4 University-5 .-6)
((0,ssplit),The University of Waterloo is located in Canada.|Goose lives in this University.)
((0,pos),(The,DT) (University,NNP) (of,IN) (Waterloo,NNP) (is,VBZ) (located,JJ) (in,IN) (Canada,NNP) (.,.) (Goose,NN) (lives,VBZ) (in,IN) (this,DT) (University,NNP) (.,.))
((0,ner),(The,O) (University,ORGANIZATION) (of,ORGANIZATION) (Waterloo,ORGANIZATION) (is,O) (located,O) (in,O) (Canada,COUNTRY) (.,O) (Goose,O) (lives,O) (in,O) (this,O) (University,O) (.,O))
((0,coref),The University of Waterloo:this University; )
((1,tokenize),The-1 University-2 of-3 Waterloo-4 is-5 located-6 in-7 Canada-8 .-9 Goose-1 lives-2 here-3 .-4)
((1,cleanxml),The-1 University-2 of-3 Waterloo-4 is-5 located-6 in-7 Canada-8 .-9 Goose-1 lives-2 here-3 .-4)
((1,ssplit),The University of Waterloo is located in Canada.|Goose lives here.)
((1,pos),(The,DT) (University,NNP) (of,IN) (Waterloo,NNP) (is,VBZ) (located,JJ) (in,IN) (Canada,NNP) (.,.) (Goose,NN) (lives,NNS) (here,RB) (.,.))
((1,ner),(The,O) (University,ORGANIZATION) (of,ORGANIZATION) (Waterloo,ORGANIZATION) (is,O) (located,O) (in,O) (Canada,COUNTRY) (.,O) (Goose,O) (lives,O) (here,O) (.,O))
((1,coref),)
~

There are two issues here:

  1. To use coref, the driver-memory needs 4G, otherwise, it fails
  2. dcoref is the deterministic coref, which is different from the ml one, coref. However, most of their packages are similar, I am not sure when I write the dcoref one, will there any issues regarding the package reference?