Closed phosseini closed 5 years ago
I just realized that I was using the wrong ids for news articles. In the following there are the updated results just in case if anyone is interested:
> Fake PolitiFact: 114
> Real PolitiFact: 120
> Sum PolitiFact: 234
> ---------------------
> Fake Buzzfeed: 89
> Real Buzzfeed: 91
> Sum Buzzfeed: 180
> ---------------------
> Sum all: 414
> Fake spread count: 40416
> Real spread count: 20683
> ---------------------
> Fake affected count: 1049276
> Real affected count: 639982
P.S. the following news articles are removed because they do not have nay content/text:
BuzzFeed_Fake_13-Webpage BuzzFeed_Fake_39-Webpage PolitiFact_Fake_24-Webpage PolitiFact_Fake_29-Webpage PolitiFact_Fake_37-Webpage PolitiFact_Fake_47-Webpage PolitiFact_Fake_70-Webpage PolitiFact_Fake_90-Webpage
Shouldn't there be 91 Fake BuzzFeed news? How did you get the number 89?
Fake Buzzfeed: 89
Shouldn't there be 91 Fake BuzzFeed news? How did you get the number 89?
Fake Buzzfeed: 89
As I mentioned, in my case, I removed a couple of news articles (listed above in my post) since they did not have any content/text.
Based on the explanations in readme file and using the NewsUser.txt and UserUser.txt files, I computed some statistics about the documents in the dataset, specifically, the number of times that news articles, whether fake or real, were shared. Here are the results:
All the numbers are exactly the same for both fake and real news documents and both for PolitiFact and Buzzfeed. I am wondering if I did something wrong or if there is an issue with the number of shares in the dataset?