Closed yeeeqichen closed 2 years ago
It's to store the whole graph into a list so that edges connected with the same vertex will be aranged together. The operaction that sorted by timestamp is taken in other parts of the code.
But the according to the implementation, there isn't any sorting operation before the find_before()
method is called. From my point of view, if not sorted based on timestamp in the init_offset()
, there would be incorrect neighbors returned by find_before()
.
But the according to the implementation, there isn't any sorting operation before the
find_before()
method is called. From my point of view, if not sorted based on timestamp in theinit_offset()
, there would be incorrect neighbors returned byfind_before()
.
The data is read from files and it's already sorted in the file.
In the init_off_set function of class NeighborFinder, I wonder why sorting the neighbors based on the related edge idx ? I think here should sort based on the time stamp? Looking forward for response!
I agree with you, and I also think the key used for sorting should be ts
. The variable curr
contains all the target node index, edge index and timestamp related to node i
, here i
is the index of the source node. Sorting by ts
here is to help the binary search of find_before
function, which requires an ordered list.
By the way, there is a bug in the find_before
function, please see #8 .
In the init_off_set function of class NeighborFinder, I wonder why sorting the neighbors based on the related edge idx ? I think here should sort based on the time stamp? Looking forward for response!
I agree with you, and I also think the key used for sorting should be
ts
. The variablecurr
contains all the target node index, edge index and timestamp related to nodei
, herei
is the index of the source node. Sorting byts
here is to help the binary search offind_before
function, which requires an ordered list.By the way, there is a bug in the
find_before
function, please see #8 .
This issue caused controversy and affected other papers. Please see #1 in TGSRec.
The find_before
function needs the sorted timestamp because of the binary search. I wish the authors could solve it as soon as possible.
Thanks for flagging this. I just checked the implementation. This line is indeed a bug in the uploaded implementation. The order should be determined by the timestamp not the index of the edge. The correct implementation should be using
curr = sorted(curr, key=lambda x:x[2])
However, I would like to point out that this does not affect the reproducibility of the experimentation.
The experimentation results on the two selected public datasets are identical with or without this bug. This is because the edge index depends on the order of record in the original file which happens to be sorted chronically. Therefore, the order determined by edge index is the same with that determined by timestamp in this special case.
This also means if you are using this code on your own dataset, then your results are still valid as long as the raw data are also ordered in similar way. Otherwise you need to re-run your experimentation using updated implementation.
I will send a PR to fix it and we apologize for the inconvenience brought.
Appendix: How edge index is determined: https://github.com/StatsDLMathsRecomSys/Inductive-representation-learning-on-temporal-graphs/blob/66b2ccdd6d466f342900d8837c21224a49eda7e5/process.py#L10-L36
Thanks for your reply!
I think this problem is solved now.
@richardruancw Thanks for your reply.
In the init_off_set function of class NeighborFinder, I wonder why sorting the neighbors based on the related edge idx ? I think here should sort based on the time stamp?
Looking forward for response!