Open layercake1 opened 4 years ago
I'm trying to run the OldTweetsCrawler module. The database is empty except for the claims table, which contains claims with keywords.
Every time I run the code, I either get this error:
tweets retrieved: 0, skipped: 0Twitter weird response. Try to see on browser: https://twitter.com/search?q=%20drew%20curtis%20time%20traveler%202020%20until%3A2021-08-24&src=typd
or the code hangs (for days...)
I have tried using a VPN, in case our IP has been blocked because of too much twitter scraping, but this did not help at all.
I am using this config file:
[DEFAULT] logger_name = root logger_conf_file = configuration/logging.conf start_date = date('2000-06-07 00:00:00') end_date = date('2019-07-23 23:59:59') step_size_in_sec = 691200 #step_size_in_sec = 12960000 #five days in sec = 432000 window_analyze_size_in_sec = 691200 keep_results_for = 2246400 max_concurrent_jobs = 1 domain = Microblog ;domain=Claim #domain=Blog #domain=News #domain=Article targeted_classes = ['author_type'] #social_network_name = Clickbait_Challenge #social_network_name = Fake_News #social_network_name = SBP-BRiMS_2017 social_network_name = Twitter #social_network_name = PolitiFact #social_network_url = "https://Clickbait_Challenge.com/" #social_network_url = "https://SBP-BRiMS_2017.com/" #social_network_url = "https://politifact.com/" social_network_url = "https://twitter.com/" [Logger] logger_conf_file = configuration/logging.conf logger_name = root file_name = log/bad_actors.log level = logging.INFO [OperatingSystem] linux = False windows = True mac = False [DB] DB_path = data/input/ DB_name_prefix = bad_actors_ DB_name_suffix = .db DB_path_to_extension = lib/extension-functions dialect_name = sqlalchemy.dialects.sqlite remove_on_setup = False remove_on_teardown = False dropall_on_setup = False dropall_on_teardown = False start_date = date('2010-01-01 00:00:00') ###################################################################################################### # Impoteres ###################################################################################################### # ************** DATASET BUILDER MODULE ********************** [DatasetBuilderConfig] clean_authors_features_table = False ; # [OldTweetsCrawler] ;month_interval = 36 month_interval = 12 limit_start_date = False limit_end_date = False ;limit_end_date = '2020-01-01' ;max_num_tweets = 30 max_num_tweets = 10000 max_num_of_objects_without_saving = 100 output_folder_full_path = "data/output/OldTweetsCrawlerStatistics/" #actions = ['get_old_tweets_by_claims_content', 'get_old_tweets_by_claims_keywords'] actions = ['get_old_tweets_by_claims_keywords'] ;[MissingDataComplementor] ;;actions = ['fill_tweet_retweet_connection','fill_data_for_sources','fill_data_for_followers','fill_data_for_friends','fill_authors_time_line' ;; ,'assign_manually_labeled_authors','assign_acquired_and_crowdturfer_profiles','delete_acquired_authors','delete_manually_labeled_authors'] ;actions = ['fill_author_guid_to_posts', 'fill_data_for_sources'] ;max_users_without_saving = 10000 ;minimal_num_of_posts = 10000 ;limit_friend_follower_number = 5 ;#maximal_tweets_count_in_timeline maximal value is 200 according to Twitter API ;maximal_tweets_count_in_timeline = 5 ;output_path = "data/output/MissingDataComplementor/" ;[Twitter_Crawler] ;num_of_top_terms = 10 ;actions = ['get_most_popular_posts_by_google_trends'] ;retrieve_news_by_keywords = [] ################################################################################## ###########################Graph Builder########################################## ################################################################################## ############################################################################# ######################### FEATURE EXTRACTOR MODULES ######################### ############################################################################# ;############################################################################ ;#################Transfer Learning########################################## ;############################################################################# ;[TwitterApiRequester] ;consumer_key = "<you'r consumer key>" ;consumer_secret = "<you'r consumer secret>" ;access_token_key = "<you'r consumer token>" ;access_token_secret = "<you'r access token secret>" ;user_id = <you'r user id> ;screen_name = "<you'r screen name>" ;[Twitter_Rest_Api] ;#can be 1, 2, or 3 ;working_app_number = 2 ;maximal_get_friend_ids_requests_in_window = 15 ;maximal_get_follower_ids_requests_in_window = 15 ;maximal_get_user_requests_in_window = 180 ;maximal_user_ids_allowed_in_single_get_user_request = 100 ;num_of_requests_without_checking = 9999999999 ;num_of_twitter_status_id_requests_without_checking = 9999999999 ;num_of_twitter_timeline_requests_without_checking = 9999999999 ;maximal_number_of_retrieved_users = 1000 ;max_tweet_ids_allowed_in_single_get_tweets_by_tweet_ids_request = 100 ;max_num_of_tweet_ids_requests_without_checking = 900```
I updated the OldCrawlerTweet Package Are the claims' keywords are separated by a comma? if not change it in "_get_tweets_for_claim_by_keywords" method at class "OldTweetsCrawler"
I'm trying to run the OldTweetsCrawler module. The database is empty except for the claims table, which contains claims with keywords.
Every time I run the code, I either get this error:
or the code hangs (for days...)
I have tried using a VPN, in case our IP has been blocked because of too much twitter scraping, but this did not help at all.
I am using this config file: