HSwiterZ / Next-POI-Recommendation

8 stars 2 forks source link

Some duplicated data needs to be removed #1

Open zelo2 opened 6 months ago

zelo2 commented 6 months ago

Hi, I have read one dataset (NYC-F) you presented, and I find that there are some duplicated data in this dataset. For example, the data in the 22nd and 23rd rows (starting from 1) are the same in the file "NYC-F/NYC_train.csv".

So, I recommend you to use "data_name.drop_duplicated(inplace = True)" to remove the duplicated data in your dataset, and upload the updated datasets to avoid misunderstanding and incorrect reproduction. I'm looking forward to see how your model performs after such operation :).

HSwiterZ commented 6 months ago

Thank you for your suggestion! It is not easy to identify such problems and we will address them as soon as possible.

Since the performance comparison between FPGT and the ten baseline models, the ablation experiments, and the hyperparameter experiments in the paper were performed based on the NYC-G dataset, there is no problem with the conclusions in these sections.

In addition, when comparing the performance of the GETNext and FPGT models, the results in the table are obtained by training GETNext and FPGT separately in the same datasets (NYC-F, TKY), so the conclusions in this section are objective.

We will eliminate duplicates in the dataset and retrain the FPGT model to more accurately compare the performance difference between the proposed model and GETNext. Once this is done, we will update the dataset on github and put the new results of FPGT into "readme". Hope it will be helpful.

Thanks again for the information.

何宇航 @.***

---Original--- From: @.> Date: Tue, Apr 23, 2024 18:16 PM To: @.>; Cc: @.***>; Subject: [HSwiterZ/Next-POI-Recommendation] Some duplicated data needs to beremoved (Issue #1)

Hi, I have read one dataset (NYC-F) you presented, and I find that there are some duplicated data in this dataset. For example, the data in the 22nd and 23rd rows (starting from 1) are the same in the file "NYC-F/NYC_train.csv".

So, I recommend you to use "data_name.drop_duplicated(inplace = True)" to remove the duplicated data in your dataset, and upload the updated datasets to avoid misunderstanding and incorrect reproduction. I'm looking forward to see how your model performs after such operation :).

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

HSwiterZ commented 6 months ago

Hi, I have read one dataset (NYC-F) you presented, and I find that there are some duplicated data in this dataset. For example, the data in the 22nd and 23rd rows (starting from 1) are the same in the file "NYC-F/NYC_train.csv".

So, I recommend you to use "data_name.drop_duplicated(inplace = True)" to remove the duplicated data in your dataset, and upload the updated datasets to avoid misunderstanding and incorrect reproduction. I'm looking forward to see how your model performs after such operation :).

Hi there. We have addressed the issue, details are displayed in readme file. Thanks again for your problem.

HSwiterZ commented 6 months ago

Hi there. We have addressed the issue, details are displayed in readme file. Thanks again for your problem.

------------------ 原始邮件 ------------------ 发件人: "HSwiterZ/Next-POI-Recommendation" @.>; 发送时间: 2024年4月23日(星期二) 晚上6:16 @.>; @.***>; 主题: [HSwiterZ/Next-POI-Recommendation] Some duplicated data needs to be removed (Issue #1)

Hi, I have read one dataset (NYC-F) you presented, and I find that there are some duplicated data in this dataset. For example, the data in the 22nd and 23rd rows (starting from 1) are the same in the file "NYC-F/NYC_train.csv".

So, I recommend you to use "data_name.drop_duplicated(inplace = True)" to remove the duplicated data in your dataset, and upload the updated datasets to avoid misunderstanding and incorrect reproduction. I'm looking forward to see how your model performs after such operation :).

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

zelo2 commented 6 months ago

Hi, I have read one dataset (NYC-F) you presented, and I find that there are some duplicated data in this dataset. For example, the data in the 22nd and 23rd rows (starting from 1) are the same in the file "NYC-F/NYC_train.csv". So, I recommend you to use "data_name.drop_duplicated(inplace = True)" to remove the duplicated data in your dataset, and upload the updated datasets to avoid misunderstanding and incorrect reproduction. I'm looking forward to see how your model performs after such operation :).

Hi there. We have addressed the issue, details are displayed in readme file. Thanks again for your problem.

Thanks, that is a really impressive work!

LinkaSage commented 6 months ago

Hello, thank you for your excellent work. Can I ask you about the detailed processing of the data set NYC-F, especially how trajectory_id is generated? Hope to get your reply, thank you very much