emorynlp / nlp4j-old

NLP tools developed by Emory University.
Other
60 stars 19 forks source link

[Fatal Error] :52:71:org.xml.sax.SAXParseException #9

Closed MinionAttack closed 8 years ago

MinionAttack commented 8 years ago

Hi, I'm trying to train a model with a train and dev from Universal Dependencies-1.2 and after adapt it to ClearNLP format I'm getting this error in command line:

[Fatal Error]: 52: 71: The attribute name "data-pjax-transient" associated with a type of "meta" element must be followed by the character '='. org.xml.sax.SAXParseException; lineNumber: 52; columnNumber: 71; The name attribute "data-pjax-transient" associated with a type of "meta" element must be followed by the character '='. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at edu.emory.mathcs.nlp.common.util.XMLUtils.getDocumentElement(XMLUtils.java:107) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:55) at edu.emory.mathcs.nlp.bin.NLPTrain$1.createGlobalLexica(NLPTrain.java:108) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:193) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:187) at edu.emory.mathcs.nlp.bin.NLPTrain.train(NLPTrain.java:76) at edu.emory.mathcs.nlp.bin.NLPTrain.main(NLPTrain.java:115) Exception in thread "main" java.lang.NullPointerException at edu.emory.mathcs.nlp.common.util.XMLUtils.getFirstElementByTagName(XMLUtils.java:74) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:60) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:55) at edu.emory.mathcs.nlp.bin.NLPTrain$1.createGlobalLexica(NLPTrain.java:108)

My train and dev files looks (separator it's '\t'):

1 [ [ PUNCT 10 punct 2 This this DET Number=Sing|PronType=Dem 3 det 3 killing killing NOUN Number=Sing 10 nsubj 4 of of ADP 7 case 5 a a DET Definite=Ind|PronType=Art 7 det 6 respected respected ADJ Degree=Pos 7 amod 7 cleric cleric NOUN Number=Sing 3 nmod 8 will will AUX VerbForm=Fin 10 aux 9 be be AUX VerbForm=Inf 10 aux 10 causing cause VERB VerbForm=Ger 0 root 11 us we PRON Case=Acc|Number=Plur|Person=1|PronType=Prs 10 iobj 12 trouble trouble NOUN Number=Sing 10 dobj 13 for for ADP 14 case 14 years year NOUN Number=Plur 10 nmod 15 to to PART 16 mark 16 come come VERB VerbForm=Inf 14 acl 17 . . PUNCT 10 punct 18 ] ] PUNCT 10 punct

1 DPA DPA PROPN Number=Sing 0 root 2 : : PUNCT 1 punct 3 Iraqi iraqi ADJ Degree=Pos 4 amod 4 authorities authority NOUN Number=Plur 5 nsubj 5 announced announce VERB Mood=Ind|Tense=Past|VerbForm=Fin 1 parataxis 6 that that SCONJ 9 mark 7 they they PRON Case=Nom|Number=Plur|Person=3|PronType=Prs 9 nsubj 8 had have AUX Mood=Ind|Tense=Past|VerbForm=Fin 9 aux 9 busted bust VERB Tense=Past|VerbForm=Part 5 ccomp 10 up up ADP 9 compound:prt 11 3 3 NUM NumType=Card 13 nummod 12 terrorist terrorist ADJ Degree=Pos 13 amod 13 cells cell NOUN Number=Plur 9 dobj 14 operating operate VERB VerbForm=Ger 13 acl 15 in in ADP 16 case 16 Baghdad Baghdad PROPN Number=Sing 14 nmod 17 . . PUNCT 1 punct _

What's wrong with my colums? Thanks in advance.

onedash commented 8 years ago

Did you change a train confinguration file (-c option), can you provide it?

MinionAttack commented 8 years ago

I'm using the configuration files from here https://github.com/emorynlp/nlp4j/tree/master/src/main/resources/edu/emory/mathcs/nlp/configuration

I'm ussing config-train-dep.xml or I should use other? I don't know if I have to use that or config-train-sample.xml or config-train-sample-optimized.xml :/

MinionAttack commented 8 years ago

Please, someone can help me?

MinionAttack commented 8 years ago

I found the error, when I downloaded the config files maybe get corrupted. I downloaded again and now seems to be correct.

jdchoi77 commented 8 years ago

Sorry; I didn't get to check this issue until now. When you have more issues, could you please post it to our discussion group: https://groups.google.com/forum/#!forum/emorynlp. Thank you!