[Info-Question]-Preprocessing step and broken image links

Dear @entitize, Thanks for a great dataset!

May I ask about some specific details in relation to your reply to Meywether:

Can the list of revealing words for each subreddit which was used to preprocess the text part of the dataset be made public?
Were the image only results mentioned in updated paper being measured after removing the broken image links present in the current multimodal_only_samples gdrive directory?

Best regards Naman

Hi Meywether,

We extracted features from (VGG,BERT,etc.) for specific size vectors. The whole dataset was processed. The images were resized to fit the input size vector for the image neural network. For example, VGG16 has an input size of 224x224, so we scaled the images to 224x224. For the text, we applied filtering as described in the paper such as removing punctuation, revealing words, etc. These features were then used for classification.

We utilized separate image and text models for feature extraction (VGG,BERT,etc.). Once all features have been extracted we trained a separate model to combine these extracted features and classify our input.

Does this answer your questions?

entitize / Fakeddit

[Info-Question]-Preprocessing step and broken image links #7