JingqingZ / BaiduTraffic

This repo includes introduction, code and dataset of our paper Deep Sequence Learning with Auxiliary Information for Traffic Prediction (KDD 2018).
225 stars 79 forks source link

What do snode and enode represent? #16

Closed prateikarora closed 5 years ago

prateikarora commented 5 years ago

Hi. My questions may seem a bit trivial but I could not find explicit explanation in the paper.

  1. The entire network is composed of road segments and snode and enode represent the endpoint of these segments? These are the end points of the edges of the graph?
  2. In paper, it is written that we are given snode and enode gps, but in issue #6 it is said that it represents the middle point of road segment.
  3. Where is the extra information on social attributes such as weekdays, weekends, public holidays, peak hours and off-peak hours described in the paper?
bbliao commented 5 years ago

@prateikarora For 1 and 2, the link is a small piece of the road, which is a segment in math, while it is viewed as a point in this paper (we use one point to represent the link, not two points). The gps of the road segment is the middle point of the segment (mathematically). Similarly, the snode and enode are also segments in math, while they are viewed as points. snode_link_enode For 3, the social attributes are described intime_feature_extraction_15min.py Thanks for your attention!

prateikarora commented 5 years ago

thank you

prateikarora commented 5 years ago

@bbliao I am sorry for so many questions, but I'm unfamiliar with deep learning, so I cannot get these answers through your code. I wish to develop a parametric model for traffic prediction for my thesis project and I wish to use this data. It would be very helpful of you if you could clear these queries.

  1. So, snode and enode are the adjacent links/road sections and you explicitly mapped the 10 digit ID to 13 digit for privacy and security?
  2. Why are there almost double the number of entries in link_id hash map as compared to road_network_sub_dataset?
  3. And if 1 is true, if i take a link id as 1135895695379 from road_network_sub_dataset, its snode and enode are 1520405827(hashmap entry: 1336623035796) and 1554152820(hashmap entry: 1848972804865) respectively. There is no entry corresponding to 1336623035796 and 1848972804865 in the road_network_sub_dataset?
  4. Also, if 1 is true what about the end segments that are connected to only one link(like the last link in a chain where the network ends)?
  5. In neighbours_1km.txt file, are these the links that are in 1km range of each other? If yes, did you just calculate the physical distance between the centre GPS coordinates of the road segments? And why does every array have same number of entries?
bbliao commented 5 years ago

@prateikarora I am sorry for your confusion. For the road network sub-dataset, the original dataset contains ∼450k road segments. however, we only release all the information (traffic speed, attributes, neighbors) of 15,073 central road segments because of disk capacity(the raw data is about 1TB), data incompleteness (not all the missing rate of the road segments are less than 1%), privacy and security. Besides, the Seq2Seq+NB method also needs the traffic speed of their neighbors, so we actually release the traffic speed of 44,172 (15,073 central road segments + their neighbors) road segments in the traffic speed sub-dataset. Given a central road segment, we construct its local directed connected road network from road_network_sub_dataset, and five predecessors and five successors in the local directed connected road network are selected based on PageRank score, resulted in the neighbours_1km.txt file. Note that the neighbors and the central road segment may not be adjacent. Of course, the traffic speed of these neighbors is provided as mentioned before. Therefore,

  1. Yes, the snode and enode are the adjacent links/road sections of the central road segment, but the extracted neighbors (using PageRank) and the central road segment may not be adjacent;

  2. You can just ignore the redundant road segments id;

  3. Yes, we can not provide the complete road network due to data incompleteness (not all the missing rate of the road segments are less than 1%), privacy and security;

  4. We only provide all the information (traffic speed, attributes, neighbors) of 15,073 central road segments and the traffic speed of 44,172 (15,073 central road segments + their neighbors) road segments. We do not care about the situations you mentioned because of data incompleteness;

  5. Yes, the neighbors are all within 1km (physical distance using GPS coordinates) from the central road segments. For computation efficiency, given a central road segment, we select five predecessors and five successors from its local network.

Thanks for your attention!

prateikarora commented 5 years ago

@bbliao Thank you very much for your time and effort. This really cleared my confusion. I only have one final query, how to identify/locate/separate these 15,073 central segments from the total segments?

bbliao commented 5 years ago

@prateikarora The first column of the neighbours_1km.txt is the 15,073 central road segments, the 2nd-6th and 7th-11th columns are the predecessors and successors, respectively.