JBGruber / rwhatsapp

An R package for working with WhatsApp data 💬
94 stars 19 forks source link

I have problem reading datetime the read_ function #26

Closed ghost closed 2 years ago

ghost commented 3 years ago

miChat <- rwa_read("miChat.txt", tz="America/Lima") %>% filter(!is.na(author)) %>% # remove messages without author filter(!text == "" )#selecciona mensajes de texto

dim(miChat) [1] 25523 6

the datetime column misses the 24 hour (am / pm) timestamp. Once the file is read the time column has been transformed into a 12 hour datetime column without the time stamp (am / pm) is remplaced by 15 (seconds) in all rows

x1 <- head(miChat$time,150) x1 [1] "2020-05-09 08:04:15 -05" "2020-05-09 08:07:15 -05" "2020-05-09 08:08:15 -05" [4] "2020-05-09 08:11:15 -05" "2020-05-09 08:13:15 -05" "2020-05-09 08:17:15 -05" [7] "2020-05-09 09:44:15 -05" "2020-05-09 10:02:15 -05" "2020-05-09 10:13:15 -05" [10] "2020-05-09 10:17:15 -05" "2020-05-10 01:04:15 -05" "2020-05-10 07:25:15 -05" [13] "2020-05-10 07:38:15 -05" "2020-05-10 08:06:15 -05" "2020-05-10 08:12:15 -05" [16] "2020-05-10 08:14:15 -05" "2020-05-10 09:11:15 -05" "2020-05-10 09:34:15 -05" [19] "2020-05-10 09:41:15 -05" "2020-05-10 09:42:15 -05" "2020-05-10 09:45:15 -05" [22] "2020-05-10 09:50:15 -05" "2020-05-10 09:50:15 -05" "2020-05-10 10:15:15 -05" [25] "2020-05-10 10:41:15 -05" "2020-05-10 10:53:15 -05" "2020-05-10 11:15:15 -05" [28] "2020-05-10 11:27:15 -05" "2020-05-10 12:02:15 -05" "2020-05-10 12:11:15 -05" [31] "2020-05-10 02:10:15 -05" "2020-05-10 03:01:15 -05" "2020-05-10 03:52:15 -05" [34] "2020-05-10 05:01:15 -05" "2020-05-10 05:03:15 -05" "2020-05-10 05:05:15 -05" [37] "2020-05-10 05:41:15 -05" "2020-05-10 06:25:15 -05" "2020-05-10 06:33:15 -05" [40] "2020-05-10 06:41:15 -05" "2020-05-10 06:47:15 -05" "2020-05-10 06:48:15 -05" [43] "2020-05-10 06:52:15 -05" "2020-05-10 08:08:15 -05" "2020-05-10 08:10:15 -05" [46] "2020-05-10 09:15:15 -05" "2020-05-10 09:17:15 -05" "2020-05-10 09:43:15 -05" [49] "2020-05-10 09:44:15 -05" "2020-05-10 09:46:15 -05" "2020-05-10 09:50:15 -05" [52] "2020-05-10 09:52:15 -05" "2020-05-10 09:52:15 -05" "2020-05-10 09:54:15 -05" [55] "2020-05-10 09:54:15 -05" "2020-05-10 09:55:15 -05" "2020-05-10 09:56:15 -05" [58] "2020-05-10 09:56:15 -05" "2020-05-10 09:58:15 -05" "2020-05-10 09:59:15 -05" [61] "2020-05-10 10:01:15 -05" "2020-05-10 10:05:15 -05" "2020-05-10 10:06:15 -05" [64] "2020-05-10 10:08:15 -05" "2020-05-10 10:09:15 -05" "2020-05-10 10:09:15 -05" [67] "2020-05-10 10:11:15 -05" "2020-05-10 10:12:15 -05" "2020-05-10 10:13:15 -05" [70] "2020-05-10 10:14:15 -05" "2020-05-10 10:16:15 -05" "2020-05-10 10:19:15 -05" [73] "2020-05-10 10:19:15 -05" "2020-05-10 10:20:15 -05" "2020-05-10 10:20:15 -05" [76] "2020-05-10 10:20:15 -05" "2020-05-10 10:23:15 -05" "2020-05-10 10:23:15 -05" [79] "2020-05-10 10:25:15 -05" "2020-05-10 10:26:15 -05" "2020-05-10 10:27:15 -05" [82] "2020-05-10 10:27:15 -05" "2020-05-10 10:28:15 -05" "2020-05-10 10:28:15 -05" [85] "2020-05-10 10:30:15 -05" "2020-05-10 10:33:15 -05" "2020-05-10 10:36:15 -05" [88] "2020-05-10 10:36:15 -05" "2020-05-10 10:37:15 -05" "2020-05-10 10:37:15 -05" [91] "2020-05-10 10:38:15 -05" "2020-05-10 10:38:15 -05" "2020-05-10 10:39:15 -05" [94] "2020-05-10 10:39:15 -05" "2020-05-10 10:40:15 -05" "2020-05-10 10:40:15 -05" [97] "2020-05-10 10:40:15 -05" "2020-05-10 10:41:15 -05" "2020-05-10 10:41:15 -05" [100] "2020-05-10 10:42:15 -05" "2020-05-10 10:44:15 -05" "2020-05-10 10:45:15 -05" [103] "2020-05-10 10:46:15 -05" "2020-05-10 10:48:15 -05" "2020-05-10 10:49:15 -05" [106] "2020-05-10 10:51:15 -05" "2020-05-10 10:52:15 -05" "2020-05-10 10:52:15 -05" [109] "2020-05-10 10:52:15 -05" "2020-05-10 10:52:15 -05" "2020-05-10 10:53:15 -05" [112] "2020-05-10 10:53:15 -05" "2020-05-10 10:53:15 -05" "2020-05-10 10:54:15 -05" [115] "2020-05-10 10:54:15 -05" "2020-05-10 10:55:15 -05" "2020-05-10 10:55:15 -05" [118] "2020-05-10 10:56:15 -05" "2020-05-10 10:57:15 -05" "2020-05-11 08:50:15 -05" [121] "2020-05-11 08:56:15 -05" "2020-05-11 08:59:15 -05" "2020-05-11 10:23:15 -05" [124] "2020-05-11 10:24:15 -05" "2020-05-11 10:25:15 -05" "2020-05-11 10:31:15 -05" [127] "2020-05-11 10:32:15 -05" "2020-05-11 10:32:15 -05" "2020-05-11 10:45:15 -05" [130] "2020-05-11 10:49:15 -05" "2020-05-11 10:55:15 -05" "2020-05-11 10:59:15 -05" [133] "2020-05-11 11:00:15 -05" "2020-05-11 11:03:15 -05" "2020-05-11 11:07:15 -05" [136] "2020-05-11 11:09:15 -05" "2020-05-11 11:10:15 -05" "2020-05-11 11:12:15 -05" [139] "2020-05-11 11:16:15 -05" "2020-05-11 11:33:15 -05" "2020-05-11 12:47:15 -05" [142] "2020-05-11 12:48:15 -05" "2020-05-11 12:59:15 -05" "2020-05-11 12:59:15 -05" [145] "2020-05-11 01:13:15 -05" "2020-05-11 04:48:15 -05" "2020-05-11 04:58:15 -05" [148] "2020-05-11 05:24:15 -05" "2020-05-11 07:47:15 -05" "2020-05-11 09:21:15 -05"

This is original miChat.txt file open in block of notes.....

9/5/2020 8:04 p. m. - Eduardo Camargo: https://youtu.be/Y6XCrVOUXN4 Al final un Congreso Populista??? Again??? 9/5/2020 8:07 p. m. - Jose Luis Olivas: Totalmente populista 9/5/2020 8:08 p. m. - Veronica Leon: oh no!🤦‍♀️ 9/5/2020 8:11 p. m. - Eduardo Camargo: Hay que buscar a Richard Rubio Gariza para hacerlo reaccionar..!? 9/5/2020 8:13 p. m. - Carlos Sabogal Marmanillo: Richard ya es un reaccionario de los israelitas 9/5/2020 8:17 p. m. - Héctor Cortez F: Se eliminó este mensaje 9/5/2020 8:54 p. m. - Jose Luis Olivas: 9/5/2020 9:44 p. m. - Edita: Disculpen...pero quiénes son???🤦🏻‍♀🤦🏻‍♀ 9/5/2020 9:51 p. m. - Antonio Magino: 9/5/2020 10:02 p. m. - Jose Luis Olivas: En la foto son El Niño Terrible de la Bombonera e icono de la U, Roberto Challe y Perico Leon icono aliancista que fallecio hoy 9/5/2020 10:13 p. m. - Edita: Ohh..q penita...QDP...gracias José Luis...😔 9/5/2020 10:17 p. m. - Jose Luis Olivas: 😉👍 9/5/2020 11:38 p. m. - Antonio Magino: 10/5/2020 1:04 a. m. - Carlos Sabogal Marmanillo: 👏🏽👏🏽👏🏽👏🏽buenazas 👍🏾👍🏾👍🏾👍🏾🙂 10/5/2020 7:04 a. m. - Lourdes Rodas Entel: 10/5/2020 7:24 a. m. - Veronica Leon: 10/5/2020 7:25 a. m. - Gino Garibotto Sandoval: Feliz día de la madre a todas las compañeras del grupo, que pásenlo bonito, dentro de lo posible. 💓🌷🌹💐🌻🌼🌸 10/5/2020 7:38 a. m. - Carlos Sabogal Marmanillo: Soy Carlos Sabogal

I don't know how to solve this problem....

ghost commented 3 years ago

camchat <- rwa_read("miChat.txt", format = "yyyy-mm-dd HH:mm:ss") %>% #evaluar mensajes multimedia

JBGruber commented 2 years ago

Thanks for letting me know! The problem was that I did not expect formats with a whitespace between "a." and "m.". I still get a weird output, but I assume you removed some messages and I don't think the package needs to be able to deal with empty messages, as it's impossible to actually send something like that.

df <- rwhatsapp::rwa_read(x = c(
  "9/5/2020 8:04 p. m. - Eduardo Camargo: https://youtu.be/Y6XCrVOUXN4",
  "Al final un Congreso Populista??? Again???",
  "9/5/2020 8:07 p. m. - Jose Luis Olivas: Totalmente populista",
  "9/5/2020 8:08 p. m. - Veronica Leon: oh no!woman_facepalming",
  "9/5/2020 8:11 p. m. - Eduardo Camargo: Hay que buscar a Richard Rubio Gariza para hacerlo reaccionar..!?",
  "9/5/2020 8:13 p. m. - Carlos Sabogal Marmanillo: Richard ya es un reaccionario de los israelitas",
  "9/5/2020 8:17 p. m. - Héctor Cortez F: Se eliminó este mensaje",
  "9/5/2020 8:54 p. m. - Jose Luis Olivas:",
  "9/5/2020 9:44 p. m. - Edita: Disculpen...pero quiénes son???🤦🏻‍♀🤦🏻‍♀",
  "9/5/2020 9:51 p. m. - Antonio Magino:",
  "9/5/2020 10:02 p. m. - Jose Luis Olivas: En la foto son El Niño Terrible de la Bombonera e icono de la U, Roberto Challe y Perico Leon icono aliancista que fallecio hoy",
  "9/5/2020 10:13 p. m. - Edita: Ohh..q penita...QDP...gracias José Luis...pensive",
  "9/5/2020 10:17 p. m. - Jose Luis Olivas: wink+1",
  "9/5/2020 11:38 p. m. - Antonio Magino:",
  "10/5/2020 1:04 a. m. - Carlos Sabogal Marmanillo: 👏🏽👏🏽👏🏽👏🏽buenazas 👍🏾👍🏾👍🏾👍🏾slightly_smiling_face",
  "10/5/2020 7:04 a. m. - Lourdes Rodas Entel:",
  "10/5/2020 7:24 a. m. - Veronica Leon:",
  "10/5/2020 7:25 a. m. - Gino Garibotto Sandoval: Feliz día de la madre a todas las compañeras del grupo, que pásenlo bonito, dentro de lo posible. heartbeattuliprosebouquetsunflowerblossomcherry_blossom",
  "10/5/2020 7:38 a. m. - Carlos Sabogal Marmanillo: Soy *Carlos Sabogal *"
))
df
#> # A tibble: 18 × 6
#>    time                author                    text    source emoji emoji_name
#>    <dttm>              <fct>                     <chr>   <chr>  <lis> <list>    
#>  1 2020-05-09 20:04:18 Eduardo Camargo           "https… text … <NUL… <NULL>    
#>  2 2020-05-09 20:07:18 Jose Luis Olivas          "Total… text … <NUL… <NULL>    
#>  3 2020-05-09 20:08:18 Veronica Leon             "oh no… text … <NUL… <NULL>    
#>  4 2020-05-09 20:11:18 Eduardo Camargo           "Hay q… text … <NUL… <NULL>    
#>  5 2020-05-09 20:13:18 Carlos Sabogal Marmanillo "Richa… text … <NUL… <NULL>    
#>  6 2020-05-09 20:17:18 Héctor Cortez F           "Se el… text … <NUL… <NULL>    
#>  7 2020-05-09 20:54:18 <NA>                      "Jose … text … <NUL… <NULL>    
#>  8 2020-05-09 21:44:18 Edita                     "Discu… text … <chr… <chr [2]> 
#>  9 2020-05-09 21:51:18 <NA>                      "Anton… text … <NUL… <NULL>    
#> 10 2020-05-09 22:02:18 Jose Luis Olivas          "En la… text … <NUL… <NULL>    
#> 11 2020-05-09 22:13:18 Edita                     "Ohh..… text … <NUL… <NULL>    
#> 12 2020-05-09 22:17:18 Jose Luis Olivas          "wink+… text … <NUL… <NULL>    
#> 13 2020-05-09 23:38:18 <NA>                      "Anton… text … <NUL… <NULL>    
#> 14 2020-05-10 01:04:18 Carlos Sabogal Marmanillo "👏🏽…    text … <chr… <chr [8]> 
#> 15 2020-05-10 07:04:18 <NA>                      "Lourd… text … <NUL… <NULL>    
#> 16 2020-05-10 07:24:18 <NA>                      "Veron… text … <NUL… <NULL>    
#> 17 2020-05-10 07:25:18 Gino Garibotto Sandoval   "Feliz… text … <NUL… <NULL>    
#> 18 2020-05-10 07:38:18 Carlos Sabogal Marmanillo "Soy *… text … <NUL… <NULL>

Created on 2022-01-05 by the reprex package (v2.0.1)