abjer / sds2019

Social Data Science 2019 - a summer school course
https://abjer.github.io/sds2019
46 stars 96 forks source link

About the Connector Class #34

Open Choptdei opened 4 years ago

Choptdei commented 4 years ago

Hi!

We are multiple people in my team using the requests module to collect tweets from Twitter API. We might use several computers to make our requests to the API. As of right now we will end up with several log-files. Is that a problem? And should we hand in our log-files?

Best regards

snorreralund commented 4 years ago

That is fine, you should just merge the files for the analysis. It would be good to hand in the logs also, but most important thing is to include an analysis of the log in the Project.

Choptdei commented 4 years ago

Hello Snorre

Thank you. We having another problem. The Twitter API demands a special format when requesting data. It looks like this:

response = requests.post(endpoint,data=data,headers=headers)

How do we use the Connector Class with a post request? We get a error when using it with the Twitter API.

Best regards

BjornCilleborg commented 4 years ago

@snorreralund regarding the log file. Should we include an analysis of the log with all the connections we have made during our project or should we, at the end of the project, reset the log and run the whole code to get the logs during the final data collection and analyse that log.

snorreralund commented 4 years ago

Okay, I just added a post method in the connector to the following version: https://github.com/snorreralund/scraping_seminar/blob/master/logging_requests.py

API is slightly changed. Instead of providing a url. Provide a dictionary of arguments to the requests.get method or requests.post method.

e.g.

define auth method

load keys and secrets

import pickle consumer_key, consumer_secret, oauth_token, oauth_token_secret = pickle.load(open('twitter_credentials.pkl','rb')) auth = OAuth1(consumer_key, consumer_secret, oauth_token, oauth_token_secret)

define query

q = 'https://api.twitter.com/1.1/statuses/user_timeline.json? screen_name=realdonaldtrump&count=200&tweet_mode=extended'

connector.get({'url':q,'auth':auth}

snorreralund commented 4 years ago

Regarding the log you should, report the log that generated the dataset that you analyze, not necessarily your process, testcalls etc.

Kristianuruplarsen commented 4 years ago

@snorreralund can you make your changes in this repo? https://github.com/elben10/ScrapingClass that would make it easy for Jakob to push an updated version to pypi.

snorreralund commented 4 years ago

just asked Jakob @elben10 to do it.

snorreralund commented 4 years ago

see issue #41