Alibaba-NLP / ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Other
298 stars 44 forks source link

Predicting sequence tags and attributes of 'Sentence' object #12

Closed Aatlantise closed 3 years ago

Aatlantise commented 3 years ago

Hi,

I have been trying to use an ACE model to perform chunking predictions. I understand that I am able to use the --parse flag, but while the command works, I also want to be able to perform predictions on single sentences using something along the lines of SequenceTagger.predict in models/sequence_tagger_model.py. But I run into attribute errors upon running it because in lines 630-640 in embeddings.py, the code references alleged attributes of sentences: List[Sentence] like max_sent_len and char_seqs that do not exist.

If SequenceTagger.predict is deprecated, is it possible to make predictions on sentences whose gold sequence labels are unknown? It's my understanding that using the --parse flag requires gold labels to be included in the parse file as well.

Thanks in advance for your help!

wangxinyu0922 commented 3 years ago

Hi, you can input a file with pseudo labels for prediction. For example:

Use O
label O
“ O
O O
” O
for O
prediction O
. O

Then you can use the --parse command for prediction. Note that your sentences must be pre-tokenized.

Aatlantise commented 3 years ago

Thanks for your clarification!

This is on a different note, but out of the many Elmo models (small, medium, large, and PubMed), which one is referred to by /root/.flair/embeddings/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5 in the config files? Judging from the parameters, it's either large or PubMed, I think. Can you also clarify?

Much thanks as always!!

wangxinyu0922 commented 3 years ago

It is the "Origin" model from allennlp (the large model). PubMed is the ELMo model trained in the biomedical domain I guess.

Aatlantise commented 3 years ago

I was able to resolve my issue with the dummy tags. Thank you for you kind help :)