Preprocessing data - Githubissues

kexinliao commented 8 years ago

Hi there,

I got empty outputs after running the preprocessing command: python amr_parsing.py -m preprocess [input_sentence_file](My input sentence file was raw document with one sentence per line.) This generated .tok, .prp and .charniak.parse.dep files, but all of them were just empty. Anyone can help with this issue?

Log info: Start Stanford CoreNLP... java -Xmx2500m -cp stanfordnlp/stanford-corenlp-full-2013-06-20/stanford-corenlp-3.2.0.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/stanford-corenlp-3.2.0-models.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/joda-time.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/xom.jar:stanfordnlp/stanford-corenlp-full-2013-06-20/jollyday.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props stanfordnlp/default.properties Loading Models: 0/4
Loading Models: 1/4
Loading Models: 2/4
Loading Models: 3/4
Loading Models: 4/4
Read token,lemma,name entity file test_input.txt.sent.prp... Loading Charniak parser model: WSJ+Gigaword ... Begin Charniak parsing ... Convert Charniak parse tree to Stanford Dependency tree ... Read dependency file test_input.txt.sent.tok.charniak.parse.dep... Done preprocessing!

Juicechuan commented 8 years ago

Hi, try removing all the old empty file and re-run it again, the parser did this cache thing and will not overwrite the file if it is there.

kexinliao commented 8 years ago

So my input file in the command should be the original file name or file name with .sent extension?

On Thu, Jul 14, 2016 at 10:54 AM, IceIceRabbit notifications@github.com wrote:

Make a copy of the sentence file and rename it the same with the .txt.sent extension it should work provided your sentences file is only sentences

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/c-amr/camr/issues/4#issuecomment-232689570, or mute the thread https://github.com/notifications/unsubscribe/ATS0NPmecj61GEbvWKRaDLazywYAk_cvks5qVk27gaJpZM4JJayN .

IceIceRabbit commented 8 years ago

your original file , your .sent file should have all the sentences extracted from the original file. If the original file is only sentences you can just copy it and rename it with .sent extension from what I have understood.

kexinliao commented 8 years ago

@IceIceRabbit Thanks! Finally I'm able to preprocess the sentence file. Now I got the following error when parsing the sentence file to amr.

Traceback (most recent call last): File "amr_parsing.py", line 439, in main() File "amr_parsing.py", line 390, in main if args.section != 'all': File "/home/kexin/AMRParsing/model.py", line 352, in load_model model = pickle.load(f) cPickle.UnpicklingError: invalid load key, 'B'.

The pre-trained model 'LDC2013E117.train.basic-abt-charniak.m' was downloaded from the link in the readme file.

IceIceRabbit commented 8 years ago

yeah, there seems to be problem with the old model,I believe that was trained on the parser before its latest update , you can try to train your own model that should work.

kexinliao commented 8 years ago

@Juicechuan Hi Chuan, Could you provide us a pre-trained model which is trained on the updated parser?

Juicechuan commented 8 years ago

@kexinliao Hi I've uploaded the new model (there is still some problem with the semeval model, but I should be able to upload it recently). Let me know if there is any question or problem.

kexinliao commented 8 years ago

@Juicechuan Thanks! It works now.

c-amr / camr

Preprocessing data #4