LibCity / Bigscity-LibCity

LibCity: An Open Library for Urban Spatial-temporal Data Mining
https://libcity.ai/
Apache License 2.0
937 stars 168 forks source link

why was the padding size set to the actual size + 1 in the Trajectory Next-Location task #317

Closed nehSgnaiL closed 2 years ago

nehSgnaiL commented 2 years ago

Hi, thanks for your incredible job.

Recently I learned the code of the location prediction task in the LibCity, and the padding size was set to the actual size + 1 on both the location feature and time feature. Why were these parameters set to the actual size + 1 rather than the actual size?

https://github.com/LibCity/Bigscity-LibCity/blob/f97062d3cdcb78f983f74a2993b6bda67f442293/libcity/data/dataset/trajectory_encoder/standard_trajectory_encoder.py#L112

I'm not quite familiar with this field, looking forward to any replies that could help me. 😄

WenMellors commented 2 years ago

self.tim_max means that time <= self.tim_max. That is, some real time value in trajectory is self.tim_max. And the padding value cannot be the value that appears in the real data, so the padding value is set to self.tim_max + 1.

ifwind commented 2 years ago

hi~ 想问一下为什么data_feature里面需要self.tim_max + 2,以及为什么loc_size=loc_id+1; 我理解self.tim_max + 1 相当于是48作为padidx,但是再+1是因为需要增加一个token表示就是除了0~48以外的结果吗?像[UNK]?

WenMellors commented 2 years ago

因为,这里我们编号是从 0 开始,所以实际的 size 会大1。例如,time 编码范围是0到47,padding 是48,size 就是0到48一共49个数。size参数是给 pytorch 的 Embedding 层使用的,所以有必要这么搞。

ifwind commented 2 years ago

因为,这里我们编号是从 0 开始,所以实际的 size 会大1。例如,time 编码范围是0到47,padding 是48,size 就是0到48一共49个数。size参数是给 pytorch 的 Embedding 层使用的,所以有必要这么搞。

哦哦对对对明白了~感谢!