aviade5 / measurement-of-online-discussion-authenticity

1 stars 5 forks source link

OldTweetsCrawler hangs or gives error #14

Open layercake1 opened 4 years ago

layercake1 commented 4 years ago

I'm trying to run the OldTweetsCrawler module. The database is empty except for the claims table, which contains claims with keywords.

Every time I run the code, I either get this error:

tweets retrieved: 0, skipped: 0Twitter weird response. Try to see on browser: https://twitter.com/search?q=%20drew%20curtis%20time%20traveler%202020%20until%3A2021-08-24&src=typd

or the code hangs (for days...)

I have tried using a VPN, in case our IP has been blocked because of too much twitter scraping, but this did not help at all.

I am using this config file:


[DEFAULT]
logger_name = root
logger_conf_file = configuration/logging.conf
start_date = date('2000-06-07 00:00:00')
end_date = date('2019-07-23 23:59:59')
step_size_in_sec = 691200
#step_size_in_sec = 12960000
#five days in sec = 432000
window_analyze_size_in_sec = 691200
keep_results_for = 2246400
max_concurrent_jobs = 1
domain = Microblog
;domain=Claim
#domain=Blog
#domain=News
#domain=Article
targeted_classes = ['author_type']
#social_network_name = Clickbait_Challenge
#social_network_name = Fake_News
#social_network_name = SBP-BRiMS_2017
social_network_name = Twitter
#social_network_name = PolitiFact
#social_network_url = "https://Clickbait_Challenge.com/"
#social_network_url = "https://SBP-BRiMS_2017.com/"
#social_network_url = "https://politifact.com/"
social_network_url = "https://twitter.com/"

[Logger]
logger_conf_file = configuration/logging.conf
logger_name = root
file_name = log/bad_actors.log
level = logging.INFO

[OperatingSystem]
linux = False
windows = True
mac = False

[DB]
DB_path = data/input/
DB_name_prefix = bad_actors_
DB_name_suffix = .db
DB_path_to_extension = lib/extension-functions
dialect_name = sqlalchemy.dialects.sqlite

remove_on_setup = False
remove_on_teardown = False
dropall_on_setup = False
dropall_on_teardown = False
start_date = date('2010-01-01 00:00:00')

######################################################################################################
# Impoteres
######################################################################################################

# ************** DATASET BUILDER MODULE **********************
[DatasetBuilderConfig]
clean_authors_features_table = False
;
#

[OldTweetsCrawler]
;month_interval = 36
month_interval = 12
limit_start_date = False
limit_end_date = False
;limit_end_date = '2020-01-01'
;max_num_tweets = 30
max_num_tweets = 10000
max_num_of_objects_without_saving = 100
output_folder_full_path = "data/output/OldTweetsCrawlerStatistics/"
#actions = ['get_old_tweets_by_claims_content', 'get_old_tweets_by_claims_keywords']
actions = ['get_old_tweets_by_claims_keywords']

;[MissingDataComplementor]
;;actions = ['fill_tweet_retweet_connection','fill_data_for_sources','fill_data_for_followers','fill_data_for_friends','fill_authors_time_line'
;;                   ,'assign_manually_labeled_authors','assign_acquired_and_crowdturfer_profiles','delete_acquired_authors','delete_manually_labeled_authors']
;actions = ['fill_author_guid_to_posts', 'fill_data_for_sources']
;max_users_without_saving = 10000
;minimal_num_of_posts = 10000
;limit_friend_follower_number = 5
;#maximal_tweets_count_in_timeline maximal value is 200 according to Twitter API
;maximal_tweets_count_in_timeline = 5
;output_path = "data/output/MissingDataComplementor/"

;[Twitter_Crawler]
;num_of_top_terms = 10
;actions = ['get_most_popular_posts_by_google_trends']
;retrieve_news_by_keywords = []

##################################################################################
###########################Graph Builder##########################################
##################################################################################

#############################################################################
######################### FEATURE EXTRACTOR MODULES #########################
#############################################################################

;############################################################################
;#################Transfer Learning##########################################
;#############################################################################

;[TwitterApiRequester]
;consumer_key = "<you'r consumer key>"
;consumer_secret = "<you'r consumer secret>"
;access_token_key = "<you'r consumer token>"
;access_token_secret = "<you'r access token secret>"
;user_id = <you'r user id>
;screen_name = "<you'r screen name>"

;[Twitter_Rest_Api]
;#can be 1, 2, or 3
;working_app_number = 2
;maximal_get_friend_ids_requests_in_window = 15
;maximal_get_follower_ids_requests_in_window = 15
;maximal_get_user_requests_in_window = 180
;maximal_user_ids_allowed_in_single_get_user_request = 100
;num_of_requests_without_checking = 9999999999
;num_of_twitter_status_id_requests_without_checking = 9999999999
;num_of_twitter_timeline_requests_without_checking = 9999999999
;maximal_number_of_retrieved_users = 1000
;max_tweet_ids_allowed_in_single_get_tweets_by_tweet_ids_request = 100
;max_num_of_tweet_ids_requests_without_checking = 900```
maor63 commented 4 years ago

I updated the OldCrawlerTweet Package Are the claims' keywords are separated by a comma? if not change it in "_get_tweets_for_claim_by_keywords" method at class "OldTweetsCrawler"