HKUDS / AutoST

[WWW'2023] "AutoST: Automated Spatio-Temporal Graph Contrastive Learning"
https://arxiv.org/abs/2305.03920
54 stars 9 forks source link

Question about POI Context Embedding Layer #1

Closed SUSTC-ChuangYANG closed 1 year ago

SUSTC-ChuangYANG commented 1 year ago

Dear Authors,

Thanks for your WWW23 paper. I noticed that in Section 3.1 POI Context Embedding Layer of your paper, you mentioned using the skip-gram model. I was curious how do your apply it to region POI vectors. So I looked at your source code: pre_poi_transformer.py # obatining the features of nodes by Transformer and Skip-gram and found that it only covers nn.Embedding.

Did I missed something? Could you please help me understand the issue? I would greatly appreciate any assistance you can provide. Thank you so much.

Best, Chuang Yang

lizzyhku commented 1 year ago

Thanks for your interest for our paper. Mabe I forget to update since preprocessing code is to much. Thanks for your notification. I will update another part of code for this part as soon as possible.

SUSTC-ChuangYANG commented 1 year ago

Thank you for your prompt reply. Looking forward to seeing the updated code.

lizzyhku commented 1 year ago

Hi, Chuang Yang, I have uploaded one version of poi preprocessing (pre_s14_poi_skip.py), you can have a look. Since there are a lot of data preprocessing code files, maybe I forget to upload some files. You can have a look. If you have any other questions, you can contact me again. Thanks for your notification. Thanks a lot.

SUSTC-ChuangYANG commented 1 year ago

Hi @lizzyhku , Thank you very much for updating the code quickly. After reading your code, I still have some questions. In the paper, you mentioned

we feed the region-specific POI vector into the Skip-gram model [2] for POI context embedding.

but I found that only poi category list was input in the code, not the region-specific POI vector.
Moreover, the poi was mapping to the index in order, hence the data used for training is i, i+1 -> i+2. Is this just a task of fiting y = x+1, e.g., input 3,4, output 5

poi_list_1 = ['drinking_water', 'toilets', 'school', 'hospital', 'arts_centre', 'fire_station', 'police', 'bicycle_parking', 'fountain', 'ferry_terminal', 'bench', 'cinema', 'cafe', 'pub', 'waste_basket', 'parking_entrance', 'parking', 'fast_food', 'bank', 'restaurant', 'ice_cream', 'pharmacy', 'taxi', 'post_box', 'atm', 'nightclub', 'social_facility', 'bar', 'biergarten', 'clock', 'bicycle_rental', 'community_centre', 'watering_place', 'ranger_station', 'boat_rental', 'recycling', 'payment_terminal', 'bicycle_repair_station', 'place_of_worship', 'shelter', 'telephone', 'clinic', 'dentist', 'vending_machine', 'theatre', 'charging_station', 'public_bookcase', 'post_office', 'fuel', 'doctors','drinking_water', 'toilets']
test_sentence = poi_list_1
# 构建训练集数据 ([ 第一个单词, 第二个单词 ], 预测目标)
trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2]) for i in range(len(test_sentence) - 2)]
# 构建测试集数据
vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}

for context, target in trigrams:
        # 准备输入模型的数据
        context_idxs = torch.tensor([word_to_ix[w] for w in context], dtype=torch.long)
        # 进行训练得到预测结果
        log_probs,out = model(context_idxs)

Could you please help me understand the issue? I would greatly appreciate any assistance you can provide.

lizzyhku commented 1 year ago

Hi Chuang Yang, it uses skip gram to get the initial embedding of each POI via the code; then since each region include some POIs, " region-specific POI vector" represents the combined vector of POIs included in regions based on initial POI vector obtained from skip-gram. We need to perform mapping POIs into each region via coordindates of POIs firstly before obtaining each initial POI vector. If you still have questions, you can also contact me again. Thanks for your interest in our paper.

SUSTC-ChuangYANG commented 1 year ago

Hi @lizzyhku ,

Sorry for the late reply! Thanks so much for your explanation.

Thank you for your explanation. However, I still have some questions that I don't understand. I hope you don't mind.

Q1. How do you define the context of pois so as to generate the training data for Skip gram model ? In NLP, the context of a word are the previous and next words in a sentence. In the paper [1] that you mentioned inspried you using skip gram model for poi embeddding, they said

the context of l, It includes the POIs visited before and after l based on the pre-defined window size

It is obvious that these contexts are all built based on semantic sequences, like sentences and check-in trajectories, whoes order contains semantic information.

Hence, I am curious about how to define the sequence of POIs in the your work Is it just the POI category list without semantic information? ( according to code If so, What is the significance of obtaining POI embedding in this way?

Q2. In your paper:

Section 3.1 After that, we concatenate the region-specific POI embeddings and generate the POI-aware representations: ̄E =MLP( Skip-gram(P)) Definition 1. Regional Point-of-Interests (POIs) Matrix P.

You use the same symbol P, it makes me a little confused, are these two P representing the same thing?

[1] Rahmani, Hossein A., et al. "Category-aware location embedding for point-of-interest recommendation." Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval. 2019.

lizzyhku commented 1 year ago

Hi Yang Chuang, Thanks for your interest in our paper. About Question 1, I am confused your meaning. Since about POI view, we want to get the context meaning of POI of each region. About the window size, which is CONTEXT_SIZE = 2 in the code. Using the former two POIs, obtainig the next one initial embedding. About how to define the sequence of POIs in the your work, which is defined in the code.

Regional Point-of-Interests (POIs) Matrix P represents the subgraph of POI or POI view. And in order to get the initial embedding of POI information, we use Skip-gram to handle it. And to make reader clear, MLP( Skip-gram(P)) represents the sequence of POI view, thereby using the same P. Hope my explanation helps you get clear. Thanks for your interest in our paper. Also if you are still confused, you can also contact me again. Thanks a lot.

SUSTC-ChuangYANG commented 1 year ago

Hi @lizzyhku ,

Thank you for your prompt reply. Let me make my question more clear.

For Q1 I am asking the semantic information contained in the POI sequence.

Skip-gram model is devised for semantic sequences data, which means the sequence order is not random. BUT in your code, I could't find any semantic information. It seems that the context POIs is randomly selected. Could you kindly please help me undertstand

  1. what kind of semantic information contains in your context ?In other words, for example, why drinking_water, toilets, are the context of school ?
  2. Does the order of the POI list have any significance? why drinking_water is the first, toilet is the one after drinking_water? How do you sort it?
poi_list_1 = ['drinking_water', 'toilets', 'school', 'hospital', 'arts_centre', 'fire_station', 'police', 'bicycle_parking', 'fountain', 'ferry_terminal', 'bench', 'cinema', 'cafe', 'pub', 'waste_basket', 'parking_entrance', 'parking', 'fast_food', 'bank', 'restaurant', 'ice_cream', 'pharmacy', 'taxi', 'post_box', 'atm', 'nightclub', 'social_facility', 'bar', 'biergarten', 'clock', 'bicycle_rental', 'community_centre', 'watering_place', 'ranger_station', 'boat_rental', 'recycling', 'payment_terminal', 'bicycle_repair_station', 'place_of_worship', 'shelter', 'telephone', 'clinic', 'dentist', 'vending_machine', 'theatre', 'charging_station', 'public_bookcase', 'post_office', 'fuel', 'doctors','drinking_water', 'toilets']

For Q2, so they are not the same thing?

On the contrary, in my opinion, using the same symbol for different things actually makes reader unclear.

Thanks again for your prompt reply.

Best, Chuang Yang

lizzyhku commented 1 year ago

Hi Yang Chuang, Thanks for your interest in our paper. About the first question, the sequence of POI order is according to frequence of POI we conduct statistics. About question 2, the sequence list of POI view P is that sent to skip-gram. If you still have questions, you can contact me again.

SUSTC-ChuangYANG commented 1 year ago

Hi @lizzyhku ,

Thanks so much for your patient explanation. I have no more questions now.

Best, Chuang