ShulinCao / OpenNRE-PyTorch

Neural Relation Extraction implemented in PyTorch
MIT License
219 stars 45 forks source link

Question about how to build bags #18

Open speedcell4 opened 5 years ago

speedcell4 commented 5 years ago

Hi~

I noticed that you aggregate only contiguous instances with the same tup into the same bag, in other words, you aggregate instances locally. https://github.com/ShulinCao/OpenNRE-PyTorch/blob/master/gen_data.py#L152-L155

but RESIDE handles this by aggregating instances globally, any instance with the same (head, relation, tail) triplet will be put into the same bag, no matter they are next/previous to each other or not. https://github.com/malllabiisc/RESIDE/blob/master/preproc/make_bags.py#L23

take an example to illustrate this. say we have 4 instances like

head1, rel1, tail1, sent1
head1, rel1, tail1, sent2
head1, rel1, tail2, sent3
head1, rel1, tail1, sent4

your program will generate 3 bags, i.e.

but RESIDE will generate 2 bags, i.e.

so which one will be better? or this does not matter actually? I mean, the order of instances in the raw dataset is just random? or it keeps some prior order? for training, yes, I suggest this does not matter. but for evaluation, I guess things will be different.

Thanks~

speedcell4 commented 5 years ago

Sorry, I did not notice you sort instances first before building bags. but doesn't you need to worry about the memory? since some bags will contain too many sentences. and I did not find any snippet in your code which split bag_scope