EU-ECDC / epitweetr

ECDC Early warning tool using Twitter data
European Union Public License 1.2
55 stars 14 forks source link

Aggregate error keep happening #31

Closed Abdouzster closed 3 years ago

Abdouzster commented 3 years ago

Max number of retries reached failed while serie country_counts for 2021-07-05 languages simpleWarning in shell(tcmd): ' call "C:\Program Files\Java\jre1.8.0_281\bin\java" -cp "C:/Users/azaghlool/Documents/R/win-library/4.0/epitweetr/java/ecdc-twitter-bundle_2.12-1.0.jar;E:/Twitter/jars/" -Dfile.encoding=UTF8 -Xmx2g org.ecdc.twitter.Tweets getTweets tweetPath "E:/Twitter/tweets/search" geoPath "E:/Twitter/tweets/geolocated" pathFilter ".2021.07.04.,.2021.07.05.,.2021.07.06.,.2021.07.07.,.2021.07.10.,.2021.07.11.,.2021.07.12.,.2021.07.13.*" columns "cast(sum(case when is_retweet then 1 else 0 end) as Integer) as retweets||cast(sum(case when is_retweet then 0 else 1 end) as Integer) as tweets||@known_retweets||@known_original" groupBy "topic||date_format(created_at, 'yyyy-MM-dd') as created_date||date_format(created_at, 'HH') as created_hour||coalesce(text_loc.geo_country_code, linked_text_loc.geo_country_code) as tweet_geo_country_code||coalesce(place_full_name_loc.geo_country_code, linked_place_full_name_loc.geo_country_code, user_location_loc.geo_country_code, user_description_loc.geo_country_code) as user_geo_country_code" sortBy "" filterBy "date_format(created_at, 'yyyy-MM-dd') = '2021-07-05'" sourceExpressions "tweet||created_at||is_retweet||screen_name||linked_screen_name||tweet_longitude||tweet_latitude||place_longitude||place_latitude||linked_place_longitude||linked_place_latitude|||geo||text_loc||linked_text_loc||user_location_loc||user_description_loc||place_full_name_loc||linked_place_full_name_loc||text_loc||linked_text_loc||place_full_name_loc||linked_place_full_name_loc" langCodes "en,fr,es,pt" langNames "English,French,Spanish,Portuguese" langPaths "E:/Twitter/languages/en.txt.gz,E:/Twitter/languages/fr.txt.gz,E:/Twitter/languages/es.txt.gz,E:/Twitter/languages/pt.txt.gz" parallelism 2 params "C:\Users\AZAGHL~1\AppData\Local\Temp\RtmpUVlQgH\repl142c57952724.txt" >"C:\Users\AZAGHL~1\AppData\Local\Temp\RtmpUVlQgH\file142c289e6005"' execution failed with error code 1

Abdouzster commented 3 years ago

epitweetr:::get_aggregated_serie("country_counts", "2021-04-10", list("2021.07.05", "2021.07.06", "2021.07.07")) Aggregating series country_counts ( 2021-04-10 ) by looking on tweets collected between ( 2021.07.05, 2021.07.06, 2021.07.07 ) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

epitweetr:::get_aggregated_serie("country_counts", "2021-07-05", list("2021.07.05", "2021.07.06", "2021.07.07")) Aggregating series country_counts ( 2021-07-05 ) by looking on tweets collected between ( 2021.07.05, 2021.07.06, 2021.07.07 ) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties topic created_date created_hour tweet_geo_country_code 1 COVID-19 2021-07-05 05 BJ 2 COVID-19 2021-07-05 08 MX 3 Cocaine 2021-07-05 16 IR 4 Newest%20Drug 2021-07-05 18 IT 5 COVID-19 2021-07-05 19 CO 6 Psilocybin 2021-07-05 00 7 Strange%20Names 2021-07-05 08 8 Marijuana 2021-07-05 12 9 Addict 2021-07-05 12 10 Addict 2021-07-05 12 11 Cocaine 2021-07-05 16 TR 12 Marijuana 2021-07-05 16 US 13 Marijuana 2021-07-05 16 14 Dengue 2021-07-05 18 15 Cocaine 2021-07-05 19 GY 16 Heroin 2021-07-05 19 RU 17 COVID-19 2021-07-05 02 IN 18 Dengue 2021-07-05 08 PH 19 Unknown-Disease 2021-07-05 09 BO 20 COVID-19 2021-07-05 18 AU 21 Cocaine 2021-07-05 20 MX 22 Addict 2021-07-05 23 HR 23 Cocaine 2021-07-05 03 24 Newest%20Drug 2021-07-05 14 IT 25 Cocaine 2021-07-05 17 TH 26 Dengue 2021-07-05 04 US 27 Dilaudid 2021-07-05 13 CN 28 Marijuana 2021-07-05 20 JP 29 Non-Opioid 2021-07-05 03 30 Dengue 2021-07-05 16 31 COVID-19 2021-07-05 16 BS 32 COVID-19 2021-07-05 18 PH 33 Dengue 2021-07-05 00 34 Cocaine 2021-07-05 04 35 Alcohol 2021-07-05 07 36 COVID-19 2021-07-05 15 37 Dengue 2021-07-05 19 38 Dilaudid 2021-07-05 20 CA 39 Dilaudid 2021-07-05 05 NG 40 Fentanyl 2021-07-05 07 CN 41 Dilaudid 2021-07-05 09 NG 42 Dengue 2021-07-05 16 RU 43 Dilaudid 2021-07-05 00 44 Cocaine 2021-07-05 04 AU 45 SARS 2021-07-05 06 CA 46 Addict 2021-07-05 12 IN 47 Non-Opioid 2021-07-05 15 TH 48 COVID-19 2021-07-05 20 CL 49 Strange%20Names 2021-07-05 11 XK 50 Cocaine 2021-07-05 14 AT 51 COVID-19 2021-07-05 14 ID 52 Marijuana 2021-07-05 15 ML 53 Cocaine 2021-07-05 21 IT 54 COVID-19 2021-07-05 22 AR 55 COVID-19 2021-07-05 00 GR 56 SARS 2021-07-05 02 CN 57 Heroin 2021-07-05 10 58 PCP 2021-07-05 12 ET 59 Addict 2021-07-05 12 HT 60 Addict 2021-07-05 13 61 Heroin 2021-07-05 18 TH 62 SARS 2021-07-05 02 CA 63 COVID-19 2021-07-05 17 PL 64 SARS 2021-07-05 22 SA 65 Dilaudid 2021-07-05 00 CR 66 Addict 2021-07-05 02 EG 67 Marijuana 2021-07-05 12 CA 68 Overdose 2021-07-05 07 CN 69 SARS 2021-07-05 13 CR 70 Non-Opioid 2021-07-05 15 IT 71 Marijuana 2021-07-05 15 TH 72 Cocaine 2021-07-05 19 RO 73 Dengue 2021-07-05 21 PH 74 Dilaudid 2021-07-05 22 DE 75 Naloxone 2021-07-05 01 CA 76 COVID-19 2021-07-05 11 SC 77 Dilaudid 2021-07-05 18 SI 78 Dilaudid 2021-07-05 20 79 COVID-19 2021-07-05 06 CA 80 Dilaudid 2021-07-05 07 TJ 81 Addict 2021-07-05 10 PE 82 Marijuana 2021-07-05 12 NG 83 Marijuana 2021-07-05 16 PH 84 Phenethylamine 2021-07-05 20 BR 85 Cocaine 2021-07-05 19 86 Dengue 2021-07-05 19 FR 87 COVID-19 2021-07-05 20 88 Addict 2021-07-05 13 IN 89 Non-Opioid 2021-07-05 15 90 Cocaine 2021-07-05 18 AZ 91 Non-Opioid 2021-07-05 03 AR 92 Addict 2021-07-05 09 CN 93 Psilocybin 2021-07-05 02 94 Non-Opioid 2021-07-05 03 MX 95 Dilaudid 2021-07-05 08 96 COVID-19 2021-07-05 15 ID 97 Cocaine 2021-07-05 06 MX 98 Cocaine 2021-07-05 19 IT 99 COVID-19 2021-07-05 01 100 Pain%20Killer 2021-07-05 04 DE 101 Addict 2021-07-05 06 102 COVID-19 2021-07-05 02 FJ 103 COVID-19 2021-07-05 10 FR 104 Dilaudid 2021-07-05 13 TD 105 Strange%20Names 2021-07-05 15 ES 106 SARS 2021-07-05 17 AT 107 COVID-19 2021-07-05 17 FR 108 COVID-19 2021-07-05 18 US 109 Marijuana 2021-07-05 23 MY 110 Addict 2021-07-05 16 111 Overdose 2021-07-05 18 user_geo_country_code retweets tweets known_retweets known_original 1 YE 1 0 0 0 2 UG 31 0 0 0 3 US 1 1 0 0 4 BE 1 0 0 0 5 CO 13 19 0 0 6 US 0 9 0 0 7 IN 5 0 0 0 8 CA 53 33 0 0 9 IN 58 22 0 0 10 HT 2 5 0 0 11 6 6 0 0 12 15 10 0 0 13 US 102 43 0 0 14 VN 2 0 0 0 15 0 1 0 0 16 2 0 0 0 17 IT 3 0 0 0 18 RU 70 7 0 0 19 FR 2 0 0 0 20 VE 1 0 0 0 21 ES 1 3 0 0 22 SK 146 6 0 0 23 CA 13 20 0 0 24 9 1 0 0 25 3 8 0 0 26 CN 1 0 0 0 27 CA 0 1 0 0 28 CO 1 0 0 0 29 MX 6 3 0 0 30 BH 1 0 0 0 31 US 0 1 0 0 32 IE 0 1 0 0 33 CO 39 0 0 0 34 AU 11 17 0 0 35 FR 1 0 0 0 36 ID 1 5 0 0 37 FR 12 2 0 0 38 9 9 0 0 39 NG 1 2 0 0 40 CG 10 0 0 0 41 AU 2 0 0 0 42 RU 7 2 0 0 43 CR 10 6 0 0 44 12 16 0 0 45 11 0 0 0 46 10 7 0 0 47 1 8 0 0 48 17 0 0 0 49 IN 10 0 0 0 50 UY 1 0 0 0 51 NZ 0 1 0 0 52 MX 0 2 0 0 53 IT 7 2 0 0 54 PH 0 1 0 0 55 13 1 0 0 56 5 0 0 0 57 CH 0 1 0 0 58 0 1 0 0 59 0 7 0 0 60 FI 1 1 0 0 61 1 0 0 0 62 GE 1 0 0 0 63 US 1 1 0 0 64 UY 2 0 0 0 65 2 2 0 0 66 0 2 0 0 67 1 9 0 0 68 SS 0 1 0 0 69 CO 0 1 0 0 70 US 20 2 0 0 71 HN 0 1 0 0 72 IT 0 1 0 0 73 US 16 1 0 0 74 DE 0 1 0 0 75 9 0 0 0 76 3 0 0 0 77 2 0 0 0 78 CA 21 23 0 0 79 ES 9 0 0 0 80 AU 0 1 0 0 81 CF 3 0 0 0 82 TW 1 0 0 0 83 CO 1 0 0 0 84 MX 4 1 0 0 85 GY 1 1 0 0 86 22 2 0 0 87 CL 14 7 0 0 88 TZ 0 1 0 0 89 TH 2 3 0 0 90 0 3 0 0 91 AR 0 2 0 0 92 NL 1 0 0 0 93 7 26 0 0 94 23 1 0 0 95 PL 1 0 0 0 96 18 2 0 0 97 LC 0 1 0 0 98 VN 0 1 0 0 99 PA 5 2 0 0 100 0 2 0 0 101 ZA 7 8 0 0 102 FJ 8 1 0 0 103 MX 3 2 0 0 104 GE 0 1 0 0 105 MX 1 2 0 0 106 EC 2 0 0 0 107 IT 2 0 0 0 108 FR 3 2 0 0 109 MX 5 0 0 0 110 AR 11 9 0 0 111 ES 0 7 0 0 [ reached 'max' / getOption("max.print") -- omitted 157291 rows ]

Abdouzster commented 3 years ago

I am not sure if you reach something about this bug or not but I finally got ride of it. The C drive (looks like the program was running) was almost full. I emptied some space on the drive and the run after that didn't create an error. I am not sure if that make sense or not however I thought I would share with you. On the other hand, the program is running now but the two black windows are not popping up. I am not sure why is that.

lauespinosa commented 3 years ago

Thank you Abdelhamid for the feedback!

This problem with not having enough space it is related to the temporary files. Epitweetr produces temporary files when doing the aggregation to avoid corrupted files if the machine is turned off.

You can also change the environment variables TEMP and TMP to another location or drive with more space. Please consider that these variables may be used for other tasks in your machine.

Abdouzster commented 3 years ago

You are welcome Laura! I have some questions regarding other things. The file one downloads from the dashboard about one of the topics in a specific country contains a column named "limit" and another called "baseline". This file represents the information on the curve graph (an example file attached). I wonder if this limit is the threshold or what? Is there a definition for both those fields and what is the equation used to calculate each? I went through the vignette but it was not clear for me. Would you please help me with this? Thank you, Abdelhamid

On Fri, Aug 20, 2021 at 3:50 AM Laura Espinosa @.***> wrote:

Thank you Abdelhamid for the feedback!

This problem with not having enough space it is related to the temporary files. Epitweetr produces temporary files when doing the aggregation to avoid corrupted files if the machine is turned off.

You can also change the environment variables TEMP and TMP to another location or drive with more space. Please consider that these variables may be used for other tasks in your machine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EU-ECDC/epitweetr/issues/31#issuecomment-902505399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPYS6CXSQK2BBFVWNDFWXLT5YCN5ANCNFSM5AOHMUXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- Abdelhamid Zaghlool