Noahs-ARK / semafor

http://www.ark.cs.cmu.edu/SEMAFOR
GNU General Public License v3.0
95 stars 47 forks source link

Converting postagged input to conll #20

Open arisagithub opened 8 years ago

arisagithub commented 8 years ago

Hello, when I try to run semafor, it stops in the Converting postagged input to conll phase.

Environment variables: SEMAFOR_HOME=/opt/semafor CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin MALT_MODEL_DIR=/opt/semafor_malt_model_20121129 TEMP_DIR: /tmp/semafor.oHswfdoPiw Environment variables: SEMAFOR_HOME=/opt/semafor CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin MALT_MODEL_DIR=/opt/semafor_malt_model_20121129 Environment variables: SEMAFOR_HOME=/opt/semafor CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin MALT_MODEL_DIR=/opt/semafor_malt_model_20121129

Tokenizing file: Data/Cause.txt

real 0m0.039s user 0m0.000s sys 0m0.000s Finished tokenization.

Part-of-speech tagging tokenized data.... /opt/semafor/scripts/jmx /opt/semafor/bin Read 11692 items from tagger.project/word.voc Read 45 items from tagger.project/tag.voc Read 42680 items from tagger.project/tagfeatures.contexts Read 42680 contexts, 117558 numFeatures from tagger.project/tagfeatures.fmap Read model tagger.project/model : numPredictions=45, numParams=117558 Read tagdict from tagger.project/tagdict This is MXPOST (Version 1.0) Copyright (c) 1997 Adwait Ratnaparkhi Sentence: 0 Length: 1 Elapsed Time: 0.024 seconds. Sentence: 1 Length: 0 Elapsed Time: 0.0 seconds.

real 0m1.937s user 0m0.800s sys 0m0.048s /opt/semafor/bin Finished part-of-speech tagging tokenized data.

Converting postagged input to conll. Exception in thread "main" java.lang.IllegalArgumentException: at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:83) at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:115) at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:100) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.convertStream(ConvertFormat.java:94) at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.main(ConvertFormat.java:76) Caused by: java.lang.IllegalArgumentException: PosToken must have 2 "_"-separated fields at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) at edu.cmu.cs.lti.ark.fn.data.prep.formats.Token.fromPosTagged(Token.java:248) at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$2.decodeToken(SentenceCodec.java:28) at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:79) ... 6 more

Any help you can give will be greatly appreciated.

sammthomson commented 8 years ago

Hi, is there any chance your input file has some blank lines in it? I've run into a similar error in that case. If so, a temporary work-around might be to delete the empty lines before running SEMAFOR.

arisagithub commented 8 years ago

Thank you very much for your reply. I fix this problem by copying the content of input .txt file to a new linux plaintext document and using this new file as input. but now I have another problem, I need Xml output but, when I set the outputfile path to "/out.xml" the generated output file format is not xml.

here is a sample output: {"frames":[{"target":{"name":"Operational_testing","spans":[{"start":3,"end":4,"text":"test"}]},"annotationSets":[{"rank":0,"score":71.00063282566339,"frameElements":[{"name":"Product","spans":[{"start":4,"end":10,"text":"for SEMAFOR , a frame-semantic parser"}]}]}]}],"tokens":["This","is","a","test","for","SEMAFOR",",","a","frame-semantic","parser","."]} {"frames":[{"target":{"name":"Shapes","spans":[{"start":5,"end":6,"text":"line"}]},"annotationSets":[{"rank":0,"score":11.818446277549976,"frameElements":[{"name":"Shape","spans":[{"start":5,"end":6,"text":"line"}]}]}]}],"tokens":["This","is","just","a","dummy","line","."]} {"frames":[{"target":{"name":"Existence","spans":[{"start":0,"end":2,"text":"There 's"}]},"annotationSets":[{"rank":0,"score":52.10168633235354,"frameElements":[{"name":"Entity","spans":[{"start":2,"end":5,"text":"a Santa Claus"}]}]}]}],"tokens":["There","'s","a","Santa","Claus","!"]}

how can i get xml output from Semafor?

s-pranita commented 7 years ago

I am getting same error in Windows even after removing '\n'. How to resolve it ?

arisagithub commented 7 years ago

I was able to solve it in ubuntu by copying content of input file(.txt format) to a file without extention.

On Thu, Jun 15, 2017 at 3:52 PM, s-pranita notifications@github.com wrote:

I am getting same error in Windows even after removing '\n'. Any help you can give will be greatly appreciated.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Noahs-ARK/semafor/issues/20#issuecomment-308703151, or mute the thread https://github.com/notifications/unsubscribe-auth/AQMO6-Sdk6xF42nKKSuvQOK5bdgH9_Dcks5sERP8gaJpZM4Ipk_C .

s-pranita commented 7 years ago

Is there any way to make it work in Windows?