why was the padding size set to the actual size + 1 in the Trajectory Next-Location task

LibCity / Bigscity-LibCity

LibCity: An Open Library for Urban Spatial-temporal Data Mining

https://libcity.ai/

Apache License 2.0

937 stars 168 forks source link

why was the padding size set to the actual size + 1 in the Trajectory Next-Location task #317

Closed nehSgnaiL closed 2 years ago

nehSgnaiL commented 2 years ago

Hi, thanks for your incredible job.

Recently I learned the code of the location prediction task in the LibCity, and the padding size was set to the actual size + 1 on both the location feature and time feature. Why were these parameters set to the actual size + 1 rather than the actual size?

https://github.com/LibCity/Bigscity-LibCity/blob/f97062d3cdcb78f983f74a2993b6bda67f442293/libcity/data/dataset/trajectory_encoder/standard_trajectory_encoder.py#L112

I'm not quite familiar with this field, looking forward to any replies that could help me. 😄

WenMellors commented 2 years ago

self.tim_max means that time <= self.tim_max. That is, some real time value in trajectory is self.tim_max. And the padding value cannot be the value that appears in the real data, so the padding value is set to self.tim_max + 1.

ifwind commented 2 years ago

hi~ 想问一下为什么data_feature里面需要self.tim_max + 2，以及为什么loc_size=loc_id+1；我理解self.tim_max + 1 相当于是48作为padidx，但是再+1是因为需要增加一个token表示就是除了0~48以外的结果吗？像[UNK]？

WenMellors commented 2 years ago

因为，这里我们编号是从 0 开始，所以实际的 size 会大1。例如，time 编码范围是0到47，padding 是48，size 就是0到48一共49个数。size参数是给 pytorch 的 Embedding 层使用的，所以有必要这么搞。

ifwind commented 2 years ago

因为，这里我们编号是从 0 开始，所以实际的 size 会大1。例如，time 编码范围是0到47，padding 是48，size 就是0到48一共49个数。size参数是给 pytorch 的 Embedding 层使用的，所以有必要这么搞。

哦哦对对对明白了~感谢！