KaiDMML / FakeNewsNet

This is a dataset for fake news detection research
1.11k stars 431 forks source link

some tweet objects are duplicated in fake and real (at the same time) #39

Open kosty4 opened 4 years ago

kosty4 commented 4 years ago

I was doing some preprocessing, and I found out that there could be matching tweet objects(containing same tweet_id, creation time, user_id, etc, the only thing is different is the label. (although news pieces are about 2 complete different things).
please let me know if im wrong.

kosty4 commented 4 years ago

I have an incomplete dataset of politifact and I have found about 8000 duplicates

SaschaStenger commented 4 years ago

Interesting. Can you provide an example? Something like a tweet id, and the news items this tweet id appears in?

kosty4 commented 4 years ago

@SaschaStenger, sure , for example: 1020828899750449152 appears to be in politifact13905 (fake) and politifact12751 (real). Also, few more: 1020831839085387777 in politifact13905 and politifact12751... So (for me), it looks like news piece number 12751 is corrupt/dowloaded incorrectly. Again, i'm not sure, it might happened only to me. I will remove that piece completely. I also wrote you an email. please check it. thanks