entitize / Fakeddit

r/Fakeddit New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
https://fakeddit.netlify.app/
165 stars 28 forks source link

[Info-Question]-Preprocessing step and broken image links #7

Closed naman-32 closed 4 years ago

naman-32 commented 4 years ago

Dear @entitize, Thanks for a great dataset!

May I ask about some specific details in relation to your reply to Meywether:

Best regards Naman

Hi Meywether,

  • We extracted features from (VGG,BERT,etc.) for specific size vectors. The whole dataset was processed. The images were resized to fit the input size vector for the image neural network. For example, VGG16 has an input size of 224x224, so we scaled the images to 224x224. For the text, we applied filtering as described in the paper such as removing punctuation, revealing words, etc. These features were then used for classification.
  • We utilized separate image and text models for feature extraction (VGG,BERT,etc.). Once all features have been extracted we trained a separate model to combine these extracted features and classify our input.

Does this answer your questions?

entitize commented 4 years ago

'psbattle', 'colourised', 'colorized', 'propaganda' and names of months

The results in the paper are solely based on the images present in the google drive dataset. So the image only results are based on the images in the gdrive directory, not the image links.