dair-iitd / openie6

OpenIE6 system
GNU General Public License v3.0
121 stars 38 forks source link

meta_data_vocab comprises of sentences, not tokens #16

Open Aatlantise opened 2 years ago

Aatlantise commented 2 years ago

Hello,

It looks like meta_data_vocab used as an argument for model declaration is ... not in a format familiar to me? The vocabularies seem to be comprised of sentences, rather than tokens.

I attempted not providing meta_data_vocab as an input, since it seems to be an optional argument, but that also fails due to a snipper of code that invokes meta_data_vocab.itos.

>>> META_DATA.vocab.itos
['<unk>', 'A trial run run on this initialization sentence initializes the OpenIE6 open information extractor .']
>>> meta_data_vocab.itos
['<unk>', 'A trial run run on this initialization sentence initializes the OpenIE6 open information extractor .']

Is meta_data_vocab meant to look like this? I was trying to declare a model that could be used for predicting any given input text, but meta_data_vocab seems to prevent this, assigning each model to one specific predict_fp.

Much thanks!

SaiKeshav commented 2 years ago

Hi, thank you for your interest in our work. Yes, meta_data_vocab you have is correct. It contains the actual sentences themselves. So that when we print the final predictions of the system, we print the corresponding sentence along with it.

To achieve what you want, one simple solution is to pass the meta_data_vocab to the forward function instead at the time of initialization. Does that make sense?

Aatlantise commented 2 years ago

Much thanks for your input! I was able to do what I wanted by declaring the model without meta_data_vocab then setting model._meta_data_vocab at the later time, before each inference. A natural follow up question is--would I be able to infer each input text without having to declare a new trainer object every time? The software allows me to do so, but seems to use an indexing of some sort from previous dataloader objects.