aistairc / sports-reporter

⛹️Code for Learning to Select, Track, and Generate for Data-to-Text (Iso et al; ACL 2019).
https://www.aclweb.org/anthology/P19-1202
MIT License
12 stars 1 forks source link

Annotation file #1

Closed HannahYY closed 4 years ago

HannahYY commented 4 years ago

Where is the text annotation file? Or what is the name ? thanks

hdb1301040027 commented 4 years ago

Where is the text annotation file? Or what is the name ? thanks Did you finally figure out ANNOTAION=path

HannahYY commented 4 years ago

Yes, and I guess ANNOTATION = data2text-1/ie*/rotowire-modified-anno.txt ​. Is it right?

isomap commented 4 years ago

Hi, sorry for the late response. You can make an annotation file using the IE model we provide. After running the setup.sh, first you need to make the gold text file for training data, which is tokenized by NLTK.

TRAIN_TXT=train.txt
cat train.json | python -c 'import sys, json, nltk; print("\n".join(" ".join(nltk.word_tokenize(" ".join(x["summary"]))) for x in json.load(sys.stdin)))' > $TRAIN_TXT

Then, you can run the following command to obtain the annotation file for training data.

python data_utils.py -mode prep_gen_data -gen_fi $TRAIN_TXT$ -dict_pfx "rotowire-modified-ie" -output_fi train_gold.h5 -input_path "../rotowire_v2" -train
th extractor.lua -gpuid 1 -datafile rotowire-modified-ie.h5 -preddata train_gold.h5 -dict_pfx "rotowire-modified-ie" -just_eval

Finally, you can find the annotaion file, train_gold.h5-tuples.txt, in the same directory.

isomap commented 4 years ago

As @HannahYY mentioned, the attached file, rotowire-modified-anno.txt, is also retrieved with this procedure. I'll write down a more detailed procedure for making an annotation file around mid of Dec.

hdb1301040027 commented 4 years ago

Thank you for your reply, but I have not solved the problem yet. Are your python 2.7 and dynet 2.1 versions respectively? My implementation steps are as follows: 1、 I preprocessed “python make_data.py $DATA $ANNOTATION $VOCAB”, the dump folder was generated "Reporter_nh_vocab-128_nh_rnn-512_writer_15.dy". is right? 2、And then, executive training "python reporter.py train ../dump/Reporter_nh_vocab-128_nh_rnn-512_26.dy --valid_file ../rotowire_v2/valid.json", But the following error occurred: [dynet] random seed: 2900451995 [dynet] using autobatching [dynet] allocating memory: 7544MB [dynet] memory allocation done. 2019-11-25 17:03:32.382964 Log dir at /tmp/1574672612 2019-11-25 17:03:32.383040 Loading dataset... Traceback (most recent call last): File "reporter.py", line 112, in cli() File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "reporter.py", line 62, in train wv = WordVocab.from_dump(d["vocab"]["word"]) File "/home/hikey970/aistairc/sports-reporter/vocab.py", line 17, in from_dump vocab = cls.new(cls) AttributeError: class WordVocab has no attribute 'new'

Is this a problem with my installation package version?

isomap commented 4 years ago

@hdb1301040027 No, you can use Python >= 3.6 :) In addition, ../dump/Reporter_nh_vocab-128_nh_rnn-512_26.dy is the model dump file, not the vocab file. You can run make_data.py to get a vocab (& data) file before training.

isomap commented 4 years ago

Hi! Sorry for the late but I've updated the README.md. I hope it would be helpful for reproducing our research :)