关于数据处理 - Githubissues

JiyaoWei / bilstm_mpoa

Sentiment analysis has been a popular ﬁeld in natural language processing. Sentiments can be expressed explicitly or implicitly. Most current studies on sentiment analysis focus on the identiﬁcation of explicit sentiments. However, implicit sentiment analysis has become one of the most diﬃcult tasks in sentiment analysis due to the absence of explicit sentiment words. In this article, we propose a BiLSTM model with multi-polarity orthogonal attention for implicit sentiment analysis. Compared to the traditional single attention model, the difference between the words and the sentiment orientation can be identiﬁed by using multi-polarity attention. This difference can be regarded as a signiﬁcant feature for implicit sen timent analysis. Moreover, an orthogonal restriction mechanism is adopted to ensure that the discrim inatory performance can be maintained during optimization. The experimental results on the SMP2019 implicit sentiment analysis dataset and two explicit sentiment analysis datasets demonstrate that our model more accurately captures the characteristic differences among sentiment polarities.

14 stars 1 forks source link

关于数据处理 #5

Open gm199956 opened 1 year ago

gm199956 commented 1 year ago

作者大佬您好！由于您提供的数据集网站打不开，我自己在网上找到一些关于smp2019的数据，可是训练路径，测试路径换成我找到的数据集，一直报错。因此，不知道什么的数据形式可以跑。前期就遇到如此大问题，不知道怎么办，恳请您能提供一下您当时跑通的数据代码等文件，万分感谢！

cjj-sunshine commented 1 year ago

作者大佬您好！由于您提供的数据集网站打不开，我自己在网上找到一些关于smp2019的数据，可是训练路径，测试路径换成我找到的数据集，一直报错。因此，不知道什么的数据形式可以跑。前期就遇到如此大问题，不知道怎么办，恳请您能提供一下您当时跑通的数据代码等文件，万分感谢！

请问您复现成功了吗，输入的数据格式是什么样的，能否告知一下

JiyaoWei commented 1 year ago

您好，首先感谢您关注这个工作！这个工作进行的是情感分析任务，已经是三年前的项目，中间因为升学、更换方向，很抱歉告诉您项目中数据构建部分的文件已经丢失。运行代码所需要的输入有文本、标签、词表、查询向量、文本的elmo表示、文本的w2v表示，其中文本、标签的地址是args.train/test/dev_path，输入的文件格式为csv，包含text列和label列，词表的地址是args.vocab_path，输入的文件格式为txt，每行是一个词汇，查询向量的地址是args.query_matrix_path，输入的文件格式为json，文件里面是一个词典，词典的键是"0"、"1"、"2"，值是通过对情感词典中的情感词的词向量表示取平均操作获得的情感查询向量，文本的elmo表示的地址是args.elmo_path，输入的文件格式是h5，文件内是一个字典，字典的键是分词后的文本，字典的值是该文本的elmo表示，文本的w2v表示的地址是args.pretrained_w2v_model_path，输入的文件格式是txt，文件内每一行包含单词和单词和单词对应的词向量，中间用“\t”分割。您可以按照论文中的说明构建以上数据，您在途中若遇到任何问题可以随时联系我，谢谢！

gm199956 commented 1 year ago

大佬您好！再次打扰你了。按照您的指示，我已经完成大部分预处理，有的暂不处理如词向量处理选none，暂时跑起来。我有一个问题想请教您，我找的数据的测试集没有label，我想问一下能不能直接让测试集数据生成label。我也想用别的没有标签的文本生成文本的情感极性label。不知道能不能做到，如果可以做到，烦请您不吝赐教，感谢感谢！

---原始邮件--- 发件人: @.> 发送时间: 2022年12月16日(周五) 上午10:25 收件人: @.>; 抄送: @.**@.>; 主题: Re: [JiyaoWei/bilstm_mpoa] 关于数据处理 (Issue #5)

您好，首先感谢关注这个工作。因为项目是三年前的项目，数据代码文件目前已经丢失，

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

JiyaoWei commented 1 year ago

您好，数据集中测试集的label是被用于评估模型的，不影响生成没有label的文本的情感极性label。您需要1）修改一下读取文件部分的代码，适应没有label的测试集，2）因为计算测试指标部分使用到了测试集label，还需要注释掉该部分的代码，3）输出模型的预测结果（即main_vue.py中的predicted_ids_batch）以及所对应的文本（即main_vue.py中的input_ids_batch，这里的input_ids_batch是输入文本的id，需要将这些id转化为字符）。祝好！

gm199956 commented 1 year ago

多谢大佬您的回复指导，我已经试过了没有label的测试集也可以生成label。太感谢您提供这么好的模型供我这个菜鸟学习。下一步，我要学一下elmo词向量，不做动态词向量处理效果欠佳。再次感谢您。

---原始邮件--- 发件人: @.> 发送时间: 2022年12月29日(周四) 凌晨0:01 收件人: @.>; 抄送: @.**@.>; 主题: Re: [JiyaoWei/bilstm_mpoa] 关于数据处理 (Issue #5)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>