Why sort the neighbors nodes based on the edge_idx?

yeeeqichen commented 3 years ago

In the init_off_set function of class NeighborFinder, I wonder why sorting the neighbors based on the related edge idx ? I think here should sort based on the time stamp?

Looking forward for response!

zengkaipeng commented 2 years ago

It's to store the whole graph into a list so that edges connected with the same vertex will be aranged together. The operaction that sorted by timestamp is taken in other parts of the code.

yeeeqichen commented 2 years ago

But the according to the implementation, there isn't any sorting operation before the find_before() method is called. From my point of view, if not sorted based on timestamp in the init_offset(), there would be incorrect neighbors returned by find_before().

zengkaipeng commented 2 years ago

But the according to the implementation, there isn't any sorting operation before the find_before() method is called. From my point of view, if not sorted based on timestamp in the init_offset(), there would be incorrect neighbors returned by find_before().

The data is read from files and it's already sorted in the file.

moli-L commented 2 years ago

In the init_off_set function of class NeighborFinder, I wonder why sorting the neighbors based on the related edge idx ? I think here should sort based on the time stamp? Looking forward for response!

I agree with you, and I also think the key used for sorting should be ts. The variable curr contains all the target node index, edge index and timestamp related to node i, here i is the index of the source node. Sorting by ts here is to help the binary search of find_before function, which requires an ordered list.

By the way, there is a bug in the find_before function, please see #8 .

jeongwhanchoi commented 2 years ago

In the init_off_set function of class NeighborFinder, I wonder why sorting the neighbors based on the related edge idx ? I think here should sort based on the time stamp? Looking forward for response!

I agree with you, and I also think the key used for sorting should be ts. The variable curr contains all the target node index, edge index and timestamp related to node i, here i is the index of the source node. Sorting by ts here is to help the binary search of find_before function, which requires an ordered list.

By the way, there is a bug in the find_before function, please see #8 .

This issue caused controversy and affected other papers. Please see #1 in TGSRec.

The find_before function needs the sorted timestamp because of the binary search. I wish the authors could solve it as soon as possible.

richardruancw commented 2 years ago

Thanks for flagging this. I just checked the implementation. This line is indeed a bug in the uploaded implementation. The order should be determined by the timestamp not the index of the edge. The correct implementation should be using

curr = sorted(curr, key=lambda x:x[2])

However, I would like to point out that this does not affect the reproducibility of the experimentation.

The experimentation results on the two selected public datasets are identical with or without this bug. This is because the edge index depends on the order of record in the original file which happens to be sorted chronically. Therefore, the order determined by edge index is the same with that determined by timestamp in this special case.

This also means if you are using this code on your own dataset, then your results are still valid as long as the raw data are also ordered in similar way. Otherwise you need to re-run your experimentation using updated implementation.

I will send a PR to fix it and we apologize for the inconvenience brought.

Appendix: How edge index is determined: https://github.com/StatsDLMathsRecomSys/Inductive-representation-learning-on-temporal-graphs/blob/66b2ccdd6d466f342900d8837c21224a49eda7e5/process.py#L10-L36

yeeeqichen commented 2 years ago

Thanks for your reply!

I think this problem is solved now.

jeongwhanchoi commented 2 years ago

@richardruancw Thanks for your reply.

StatsDLMathsRecomSys / Inductive-representation-learning-on-temporal-graphs

Why sort the neighbors nodes based on the edge_idx? #6