dhruvkhattar / MVAE

This repository contains the code to the paper "MVAE: Multimodal Variational Autoencoder for Fake News Detection"
109 stars 32 forks source link

only 0.71 acc on Weibo Dataset #9

Open Tangnameless opened 2 years ago

Tangnameless commented 2 years ago

On the Weibo dataset, I only got 71 accuracy socre. I didn't change your model or training parameters. Cause I don't have image_embed.pkl and xx_content_segmented.txt, I can only preprocess the data according to my own guess.

gymbeijing commented 2 years ago

Hi @Tangnameless , I was trying to reproduce the model on Twitter dataset. But I found some file missing. How did you handle the missing files on the Weibo dataset?

Tangnameless commented 2 years ago

Hi @Tangnameless , I was trying to reproduce the model on Twitter dataset. But I found some file missing. How did you handle the missing files on the Weibo dataset?

没有进行Twitter数据集的实验,对于Weibo数据集

  1. 对于缺少的文本分词,我使用jieba进行中文分词,然后按照论文说的,用训练集自己训练32维的word2vec词向量。(直觉上觉得先把微博文本翻译成英文再嵌入多此一举)
  2. 对于缺少的图像嵌入,直接使用pytorch提供的预训练vgg-19,提取倒数第二层,输出一个4096维的向量。 由于不知道确切的预处理步骤,复现效果不理想