KaiDMML / FakeNewsNet

This is a dataset for fake news detection research
1.11k stars 432 forks source link

Some of news contain tweets which aren't related #32

Open YaGiNA opened 5 years ago

YaGiNA commented 5 years ago

Dear @KaiDMML

I'm trying to use this dataset for my research. I investigated some tweets and I found that some are not related to news at all.

For example, in the real category of politifact, articles of CQ.com had so many japanese tweets with https://t.co/XXXXXX. politifact8005 is one of CQ.com's articles and this has many tweets but mostly are just applying for promotional marketing campaigns (example tweet id: 1021190359525847040). Other tweets also refer to completely unrelated topics.

Also, I believe that news content.json contains a login error. Instead of containing the data, they only contain the text of the login page:

Need help? Contact the CQ Hotline at (800) 678-8511 or hotline@cqrollcall.com

I can confirm similar phenomenon in all the other categories. Is this intended? I am currently filtering those cases by using unicodedata.east_asian_width().