fivethirtyeight / russian-troll-tweets

770 stars 215 forks source link

Save the community some typing #21

Open buddha314 opened 6 years ago

buddha314 commented 6 years ago

I know this is an abuse of "issues" but it doesn't warrant a full repo. Here is some Python code you can cut/paste

class Rutweet:
    def __init__(self, external_author_id, author, content,
                region, language, publish_date, harvested_date,
                following, followers, updates, post_type, account_type,
                retweet, account_category, new_june_2018):

        self.external_author_id = external_author_id
        self.author = author
        self.content = content
        self.region = region
        self.language = language
        self.publish_date = publish_date
        self.harvested_date = harvested_date
        self.following = following
        self.updates = updates
        self.post_type = post_type
        self.account_type = account_type
        self.retweet = retweet
        self.account_category = account_category
        self.new_june_2018 = new_june_2018

And a quick loader.

def load_tweets(fn):
    with open(fn, 'r') as f:
        for line in f.readlines():
            fields = line.split(',')
            rut = Rutweet(fields[0], fields[1], fields[2],
                          fields[3], fields[4], fields[5],
                          fields[6], fields[7], fields[8],
                          fields[9], fields[10], fields[11],
                          fields[12], fields[13], fields[14],
                         )
Meeds122 commented 6 years ago

If you're using python 3 sometimes emojis will screw with unicode decoding of the text files. Do open(fn, 'r', encoding="latin-1") if you're getting a UnicodeDecodeError

EvanCarroll commented 6 years ago

If you want to make this work with my schema, I'll make you a contributor and we can develop on it instead.

I think putting the python code in a subdirectory organized under python would be a great idea for python users. But this code is for the older v1 dataset, not the v2 data set. I've done the same thing for PostgreSQL you can find my scripts under ./PostgreSQL

EvanCarroll commented 6 years ago

@Meeds122 see my note at the bottom of #20

https://github.com/fivethirtyeight/russian-troll-tweets/issues/20#issuecomment-416716716

buddha314 commented 6 years ago

@EvanCarroll Great, could you create an issue and assign it to me? I'll try to contribute this week. I have some other things I could add as well.