DGraphXinye / DGraphFin_baseline

This is a repository contaning baseline code for DGraphFin Dataset
110 stars 24 forks source link

Problem about the edge_time #10

Open szyszyzys opened 5 months ago

szyszyzys commented 5 months ago

Hello,

Thanks for this dataset!

I am not training to convert the dataset into snapshots, while got confused about the definition of edge_time.

As I learn from the paper: "we record the time mark of the edge with a timestamp that can only reflect the time gap between each edge", I though all the edges in the datasets are sorted according to the time it appears in the graph, and the edge_time represents the interval between the appearance of 2 edges. Therefore, I thought by summing up the edge_time I can get the Time Span of the dataset. However, the value I got is 1760517108 (which is 55 years). I am wondering if I have some misunderstanding about the edge_time and want to learn how to process them correctly.

Best

szyszyzys commented 5 months ago

I think the edge_time may represent the i-th day that a edge first appears and the dataset has a time span of 821 days. Is it correct?

hxttkl commented 5 months ago

Thank you for using DGraphFin. The edge time in DGraphFin is adjusted to safeguard user privacy. The timestamps undergo a linear transformation, making them incompatible with traditional timestamps. You can refer to Section 3.2 of the paper for more details: "To safeguard users’ privacy, we record the time mark of the edge with a timestamp that can only reflect the time gap between each edge." Additionally, to protect the privacy of the data provider companies, we cannot disclose the specific time range of our dataset, as it could reveal information about the companies' operations, among other things.

szyszyzys commented 5 months ago

Thanks for the response!

I am trying to use it in the discrete time setting (i.e. create snapshots of the graphs at different time points in the correct sequence). Thus the time range and the real timestamps are not necessary for my purpose. However, there are still some information I need to use the dataset in for temporal GNNs.

Could you clarify whether "edge_time" indicates that the edges in this dataset are organized in chronological order, with "edge_time" reflecting the interval (transformed timestamp) between consecutive edge appearances? For example, are the first 100 edges listed in the 'edge_index' actually the initial 100 edges to emerge, with their respective "edge_time" values, although transformed, corresponding to the actual temporal intervals between their appearances?

hxttkl commented 5 months ago

Yes, even though the edge time undergoes a transformation, we ensure that all edge times are derived from the original time using the same linear transformation. Therefore, this transformation does not affect the interval of the edge time. @szyszyzys

szyszyzys commented 4 months ago

Thanks for the reply!