Open speedcell4 opened 5 years ago
Sorry, I did not notice you sort instances first before building bags.
but doesn't you need to worry about the memory? since some bags will contain too many sentences. and I did not find any snippet in your code which split bag_scope
Hi~
I noticed that you aggregate only contiguous instances with the same
tup
into the same bag, in other words, you aggregate instances locally. https://github.com/ShulinCao/OpenNRE-PyTorch/blob/master/gen_data.py#L152-L155but RESIDE handles this by aggregating instances globally, any instance with the same
(head, relation, tail)
triplet will be put into the same bag, no matter they are next/previous to each other or not. https://github.com/malllabiisc/RESIDE/blob/master/preproc/make_bags.py#L23take an example to illustrate this. say we have 4 instances like
your program will generate 3 bags, i.e.
sent1
andsent2
in a bagsent3
in a bagsent4
in a bag.but RESIDE will generate 2 bags, i.e.
sent1
,sent2
, andsent4
in a bagsent3
in another bag.so which one will be better? or this does not matter actually? I mean, the order of instances in the raw dataset is just random? or it keeps some prior order? for training, yes, I suggest this does not matter. but for evaluation, I guess things will be different.
Thanks~