RUCAIBox / TG-ReDial

the dataset TG-ReDial
Apache License 2.0
60 stars 5 forks source link

论文复现 #6

Closed zll0032 closed 3 years ago

zll0032 commented 3 years ago

请问复现出来的结果跟论文中提供的结果不一样,并且相差比较大,是怎么回事呢?

Lancelot39 commented 3 years ago

We have verified our code can achieve similar performance as in our paper. Can you provide the detailed hyperparameter setting and command line instructions?

zll0032 commented 3 years ago

训练模型使用的命令是python3 run2.py --exp_name Ours_1 --model_type Ours --gpu 0 模型部分:self.bert = BertModel.from_pretrained(bert_path,return_dict=False)return_dict参数是我自己加上去的,不然返回的是字符串,torch.cat函数报错我把model.py和run2.py作为附件发送了,基本上只是修改了一些文件路径

zll0032 commented 3 years ago

我是直接训练ours这个模型,没有训练BERT和SASREC,跑出来的结果很差,跟论文里面的结果相差甚远,是不是要先训练BERT和SASREC?

zll0032 commented 3 years ago

你好,我试了一下BERT和SASREC,我发现BERT和SASREC复现出来的结果都跟论文差不多,唯独Ours差别挺大

Zyh716 commented 3 years ago

你好,transformers==2.1.1, torch==1.6.0时,self.bert = BertModel.from_pretrained(bert_path)可以正确运行。不确定这两个库的版本不符时,模型是否能跑出正常的结果,你可以试一下在这样的环境下训练Ours

zll0032 commented 3 years ago

你好,我把transformers的版本改为2.1.1了,self.bert = BertModel.from_pretrained(bert_path)可以正确运行,但跑出来的结果跟之前差不多,仍然跟论文里提供的结果相差很大,这是什么原因啊?当epoch=5时Patience = 3,训练就结束了

Zyh716 commented 3 years ago

请问运行的代码/数据与仓库提供的完全一致吗?另外你可以把训练的log发给我看一下吗?

zll0032 commented 3 years ago

你好,我把训练日志和修改的代码都通过附件发送了,其中代码我修改的地方是Recommender\Union里面的dataset.py、model.py、run2.py,另外我刚才把max_patience设置成了500再开始训练,nohup.out是截止到目前的训练日志,BERT和SASREC复现出来的结果都跟论文差不多,就ours差得多

zll0032 commented 3 years ago

你好,很抱歉打扰你,Ours复现结果与论文结果相差大原因找到了吗?是什么呢?

zll0032 commented 3 years ago

你好, 想问问关于TG-ReDial数据集的几个问题。1.数据集里面,比如{2:[‘Seeker’,‘谈论’,‘开车’]},这里的谈论和开车是什表示什么含义,又是怎么得到的呢?2.mentionMovies,比如{7:[‘1292781’,‘细细的红线(1998)’]}里面的1292781表示什么含义呢,是怎么得到的呢?1998表示电影上映的时间吗?3.conv_id和user_id是随机生成的吗?

Lancelot39 commented 3 years ago

1.'谈论' and '开车' are the action and topic of this utterance, respectively. They are selected by the path on the commonsense knowledge graph. 2.For the movie, the '1292781' is the id of the movie for labeling, 1998 is the release time for distinguishing other movies with the same name. These movies are obtained from user records on Douban. 3.conv_id and user_id are randomly generated for convenience.

你好, 想问问关于TG-ReDial数据集的几个问题。1.数据集里面,比如{2:[‘Seeker’,‘谈论’,‘开车’]},这里的谈论和开车是什表示什么含义,又是怎么得到的呢?2.mentionMovies,比如{7:[‘1292781’,‘细细的红线(1998)’]}里面的1292781表示什么含义呢,是怎么得到的呢?1998表示电影上映的时间吗?3.conv_id和user_id是随机生成的吗?

zll0032 commented 3 years ago

Hello, I'd like to ask a few questions about TG-ReDial dataset.    1.What does goal_path mean in TG-ReDial dataset?      2.How do you get action and topic from content? 3.How do you get goal_path from conversation?

zll0032 commented 3 years ago

你好,论文Towards Topic-Guided Conversational Recommender System提供的代码,Conversation文件夹prepare_data.sh里面的命令python data/data_Ours/get_train.py,并没有找到data这个文件夹,请问这是怎么回事呢?

zll0032 commented 3 years ago

你好,想问问关于论文Towards Topic-Guided Conversational Recommender System提供的代码,identity2movieId.json和movies_with_mentions.csv这两个文件是怎么得到的呢?

Lancelot39 commented 3 years ago

Hello, I'd like to ask a few questions about TG-ReDial dataset.    1.What does goal_path mean in TG-ReDial dataset?      2.How do you get action and topic from content? 3.How do you get goal_path from conversation?

Q1: The goal_path means the topic thread in the whole conversation. A topic thread is composed of a sequence of topics, and a topic is the keyword from the corresponding utterance; Q2 and Q3: the topic thread is built by selecting the path from ConceptNet, and each topic corresponds to the node in the path, the action is selected by considering the user`s historical comments. All the details can be found in our paper, please refer to it for more information: "Towards Topic-Guided Conversational Recommender System"

Lancelot39 commented 3 years ago

你好,论文Towards Topic-Guided Conversational Recommender System提供的代码,Conversation文件夹prepare_data.sh里面的命令python data/data_Ours/get_train.py,并没有找到data这个文件夹,请问这是怎么回事呢?

We used to utilize this file for building the dataset, but now you only need to directly download the prepared data from this link(https://drive.google.com/drive/folders/1jLkNtUgzqBITQJsbOjSq20S2zzpY5Foj).

Lancelot39 commented 3 years ago

你好,想问问关于论文Towards Topic-Guided Conversational Recommender System提供的代码,identity2movieId.json和movies_with_mentions.csv这两个文件是怎么得到的呢?

This two files are built before construct this datasets, which are collected from Douban website from user`s real historical records.

zll0032 commented 3 years ago

What is the input to the BLEU_scorer.py file?Is it the file v11051_gen_output.txt?I took v11051_gen_output.txt as input to both BLEU_scorer.py and Dist_scorer.py.