LorrinWWW / two-are-better-than-one

Code associated with the paper **Two are Better Than One: Joint Entity and Relation Extraction with Table-Sequence Encoders**, at EMNLP 2020
196 stars 47 forks source link

custom input text #4

Closed zeyofu closed 3 years ago

zeyofu commented 3 years ago

Hi, I wonder if there is any code if we want to run a trained model on custom input text, e.g. on a pure txt file?

LorrinWWW commented 3 years ago

If the "--lm_emb_path" argument is not a file, the model will try to load it as a transformers checkpoint and generate language model embeddings online (so it can support tokenized custom input text -- tokenized words, no need for chars and bert). However, since this functionality was implemented in an earlier time, it does not output attention weights right now. I will look into this and fix it recently.

zeyofu commented 3 years ago

Thanks for such quick reply! That's alright I just need the output. Should I also keep --pretrained_wv blank?

LorrinWWW commented 3 years ago

Normally you do not need to change anything else. "--pretrained_wv" is the path of GloVe embeddings, if you want to test custom text, you should use the original GloVe embeddings rather than the reduced one (i.e., ./wv/glove.6B.100d.ace05.txt only preserves the words appearing in ACE05). And if we leave "--pretrained_wv" blank, the GloVe embeddings will be disabled.

zeyofu commented 3 years ago

Just to correct a minor bug in train.py, On line 214, it shouldn't be " model.load(args.model_read_ckpt)", but rather " model = model.load(args.model_read_ckpt)". Thanks!!

LorrinWWW commented 3 years ago

Thanks for reporting the bug! Being Fixed now.