fivethirtyeight / russian-troll-tweets

770 stars 215 forks source link

external_author_id and author relationship #32

Open EvanCarroll opened 6 years ago

EvanCarroll commented 6 years ago

This field is confusing to me, some external_author_id have multiple authors

SELECT distinct external_author_id , author
FROM rustweets.tweets WHERE external_author_id = 753000000000000000;
 external_author_id |     author      
--------------------+-----------------
 753000000000000000 | ANGELABACH991
 753000000000000000 | ANGELA_LATTKE
 753000000000000000 | BECKRALFBECK265
 753000000000000000 | CHRISTINAPOOL61
 753000000000000000 | DARRELL_H_HUNT
 753000000000000000 | DOMINIKKELLER22
 753000000000000000 | EHERMANN66
 753000000000000000 | ERIKADIXONLOVE
 753000000000000000 | JOACHIMBUCHWITZ
 753000000000000000 | LARSWOLFLARS
 753000000000000000 | LGBTUNI
 753000000000000000 | LUISSTOCKBERG
 753000000000000000 | MALTE_ROSS
 753000000000000000 | MANUELKROSSS
 753000000000000000 | MARGARETHKURZ
 753000000000000000 | MARMARSCH1
 753000000000000000 | PETERSCHULZ541
(17 rows)
gsmith-to commented 6 years ago

I was trying to split this to an 'author' table, and a 'tweets' table, and found that none of the fields below are consistent with alt_external_id, i.e. for each field 'f' in the list you can find a pair of records which have the same alt_external_id, but different values of 'f':

external_author_id author account_type account_category new_june_2018

Likewise there seem to be no fields consistent with external_author_id

EvanCarroll commented 6 years ago

So the conclusion is that the external_author_id is trash.

EvanCarroll commented 6 years ago

BTW, new_june_2018 is not unique with author,

SELECT author, count(distinct new_june_2018) FROM rustweets.tweets GROUP BY author having count(distinct new_june_2018) > 1; author | count -----------+------- MONEYFORM | 2 (1 row)