Open kosty4 opened 4 years ago
I have an incomplete dataset of politifact and I have found about 8000 duplicates
Interesting. Can you provide an example? Something like a tweet id, and the news items this tweet id appears in?
@SaschaStenger, sure , for example: 1020828899750449152 appears to be in politifact13905 (fake) and politifact12751 (real). Also, few more: 1020831839085387777 in politifact13905 and politifact12751... So (for me), it looks like news piece number 12751 is corrupt/dowloaded incorrectly. Again, i'm not sure, it might happened only to me. I will remove that piece completely. I also wrote you an email. please check it. thanks
I was doing some preprocessing, and I found out that there could be matching tweet objects(containing same tweet_id, creation time, user_id, etc, the only thing is different is the label. (although news pieces are about 2 complete different things).
please let me know if im wrong.