Error in preprocessing data/training CTB

Lijiachen1018 commented 5 years ago

Hi, I met an error in training with CTB dataset.

I download ctb8.0 from the link and preprocess the data following the guide of Github repo distance parser step 3 and get train/dev/test.txt. Then I transformed the three txt files with Stanford parser 3.3.0 to dependency structure and get train/dev/test.conll.

Running sh run_single.sh, I got an error:

Traceback (most recent call last):
  File "src_joint/main.py", line 746, in <module>
    main()
  File "src_joint/main.py", line 742, in main
    args.callback(args)
  File "src_joint/main.py", line 688, in <lambda>
    subparser.set_defaults(callback=lambda args: run_train(args, hparams))
  File "src_joint/main.py", line 225, in run_train
    train_parse = [tree.convert() for tree in train_treebank]
  File "src_joint/main.py", line 225, in <listcomp>
    train_parse = [tree.convert() for tree in train_treebank]
  File "/data/lijiachen/HPSG-Neural-Parser/src_joint/trees.py", line 93, in convert
    children.append(child.convert(index = index))
  File "/data/lijiachen/HPSG-Neural-Parser/src_joint/trees.py", line 93, in convert
    children.append(child.convert(index = index))
  File "/data/lijiachen/HPSG-Neural-Parser/src_joint/trees.py", line 93, in convert
    children.append(child.convert(index = index))
  File "/data/lijiachen/HPSG-Neural-Parser/src_joint/trees.py", line 80, in convert
    assert sub_children[-1].right == sub_child.left, str(sub_children[-1].right)+'\t'+str(sub_child.left) #contiune span
AssertionError:

Look forward to your reply.

Lijiachen1018 commented 5 years ago

solved by

downloading the latest stanford parser 3.9.2.
using the class UniversalChineseGrammaticalStructure java -cp "*" -mx1g edu.stanford.nlp.trees.international.pennchinese.UniversalChineseGrammaticalStructure -basic -keepPunct -conllx -treeFile /path/to/train.txt > /path/to/train.conll

And found Pytorch 1.2 is not working in training (in parsing it works), using 1.1 instead. For details, see this Github issue.

sustcsonglin commented 3 years ago

same issue a

DoodleJZ / HPSG-Neural-Parser

Error in preprocessing data/training CTB #6