Question about how to build bags

Hi~

I noticed that you aggregate only contiguous instances with the same tup into the same bag, in other words, you aggregate instances locally. https://github.com/ShulinCao/OpenNRE-PyTorch/blob/master/gen_data.py#L152-L155

but RESIDE handles this by aggregating instances globally, any instance with the same (head, relation, tail) triplet will be put into the same bag, no matter they are next/previous to each other or not. https://github.com/malllabiisc/RESIDE/blob/master/preproc/make_bags.py#L23

take an example to illustrate this. say we have 4 instances like

head1, rel1, tail1, sent1
head1, rel1, tail1, sent2
head1, rel1, tail2, sent3
head1, rel1, tail1, sent4

your program will generate 3 bags, i.e.

sent1 and sent2 in a bag
sent3 in a bag
sent4 in a bag.

but RESIDE will generate 2 bags, i.e.

sent1, sent2, and sent4 in a bag
sent3 in another bag.

so which one will be better? or this does not matter actually? I mean, the order of instances in the raw dataset is just random? or it keeps some prior order? for training, yes, I suggest this does not matter. but for evaluation, I guess things will be different.

Thanks~

ShulinCao / OpenNRE-PyTorch

Question about how to build bags #18