XuezheMax / NeuroNLP2

Deep neural models for core NLP tasks (Pytorch version)
GNU General Public License v3.0
441 stars 89 forks source link

Unable to find training data #45

Open boxiangliu opened 2 years ago

boxiangliu commented 2 years ago

Dear Max,

Thank you so much for making your code available. I am running your stacked pointer network but I cannot find the train/dev/test datasets

CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -u parsing.py --mode train --config configs/parsing/stackptr.json --num_epochs 600 --batch_size 32 \ --opt adam --learning_rate 0.001 --lr_decay 0.999997 --beta1 0.9 --beta2 0.9 --eps 1e-4 --grad_clip 5.0 \ --loss_type token --warmup_steps 40 --reset 20 --weight_decay 0.0 --unk_replace 0.5 --beam 10 \ --word_embedding sskip --word_path "data/sskip/sskip.eng.100.gz" --char_embedding random \ --punctuation '.' '``' "''" ':' ',' \ --train "data/PTB3.0/PTB3.0-Stanford_dep/ptb3.0-stanford.auto.cpos.train.conll" \ --dev "data/PTB3.0/PTB3.0-Stanford_dep/ptb3.0-stanford.auto.cpos.dev.conll" \ --test "data/PTB3.0/PTB3.0-Stanford_dep/ptb3.0-stanford.auto.cpos.test.conll" \ --model_path "models/parsing/stackptr/"

Could you tell me where to find the dataset? Thank you!

boxiangliu commented 2 years ago

@XuezheMax I should add that I have the Penn tree bank 3.0 data, but I am not sure how to convert it to the required format. Is there a straightforward way to do that?

chantera commented 2 years ago

I hope this will be helpful to you. https://github.com/clulab/processors/wiki/Converting-from-Penn-Treebank-to-Basic-Stanford-Dependencies https://nlp.stanford.edu/software/stanford-dependencies.shtml