Closed pcontrerasnun closed 4 years ago
Thanks Pablo! It's amazing how many different formats are out there. I added "yyyy-MM-dd, HH:mm:ss" to the supported formats (use remotes::install_github("JBGruber/rwhatsapp")
to get the new version).
I'm not really sure about the second one. "21/07/18 5:27 p. m." doesn't look quite normal. Are you sure that is exactly how WhatsApp provides the times? "21/07/18 5:27 PM" is already supported. I don't think there is a format that supports "p. m." so the only option would be to replace that with "PM". Could you send me an (extract of) the file you encountered this in for testing?
The lazy-load database
error can usually be solved by restarting RStudio.
Thanks for the quick response and fix Johannes! It's working!
Here is an extract of the file:
21/07/18 5:27 p. m. - Los mensajes y llamadas en este chat ahora están protegidos con cifrado de extremo a extremo. Toca para más información.
21/07/18 5:27 p. m. - Pablo Contreras: Este es el movil aus?
21/07/18 10:40 p. m. - Andres Garza: <Multimedia omitido>
22/07/18 3:13 a. m. - Andres Garza: Eres tonto o que mostroooo
I've asked my friend to send me another chat just to make sure that this format is consistent along his chats. I'll let you know, cause I agree with you that this is a very weird format and as I said before I couldn't manage to find a format to parse it correctly.
Hi Johannes!
It used to work fine before but now when the date format is "yyyy/M/dd HH:mm:ss"
Example: 31/1/15 14:10:59 it gets parse to 0015-01-31 (should be 2015-01-31)
Hmm. I don't have any problems with that last format:
rwhatsapp::rwa_read(x = c(
"31/1/15 04:10:59 - Johannes Gruber: Was it good?",
"31/1/15 14:10:59 - R: Yes, it was"
))
#> # A tibble: 2 x 6
#> time author text source emoji emoji_name
#> <dttm> <fct> <chr> <chr> <list> <list>
#> 1 2015-01-31 04:10:59 Johannes Gruber Was it good? text input <NULL> <NULL>
#> 2 2015-01-31 14:10:59 R Yes, it was text input <NULL> <NULL>
Created on 2020-01-25 by the reprex package (v0.3.0)
Are you sure you are using the newest version?
I looked into the other problem as well. You can include something like the following into your code to work with that format. I'm a little hesitant to include it in the package at this point since I'm not sure if it has any other implications but I'll think about it.
Let's say you read the txt file into R
with readLines()
and called the vector lines
. Then this should work:
lines <- c(
"21/07/18 5:27 p. m. - Los mensajes y llamadas en este chat ahora están protegidos con cifrado de extremo a extremo. Toca para más información.",
"21/07/18 5:27 p. m. - Pablo Contreras: Este es el movil aus?",
"21/07/18 10:40 p. m. - Andres Garza: <Multimedia omitido>",
"22/07/18 3:13 a. m. - Andres Garza: Eres tonto o que mostroooo"
)
lines <- stringi::stri_replace_all_regex(
str = lines,
pattern = c("(?<=\\d{2}) p. m. -", "(?<=\\d{2}) a. m. -"),
replacement = c(" PM -", " AM -"),
vectorize_all = FALSE
)
rwhatsapp::rwa_read(lines)
#> # A tibble: 4 x 6
#> time author text source emoji emoji_name
#> <dttm> <fct> <chr> <chr> <lis> <list>
#> 1 2018-07-21 17:27:56 <NA> Los mensajes y llamada… text i… <NUL… <NULL>
#> 2 2018-07-21 17:27:56 Pablo Co… Este es el movil aus? text i… <NUL… <NULL>
#> 3 2018-07-21 22:40:56 Andres G… <Multimedia omitido> text i… <NUL… <NULL>
#> 4 2018-07-22 03:13:56 Andres G… Eres tonto o que mostr… text i… <NUL… <NULL>
Created on 2020-01-25 by the reprex package (v0.3.0)
Thanks for the tip for the a. m. and p. m. issue!
Yup, I'm using the last version: remotes::install_github("JBGruber/rwhatsapp") Skipping install of 'rwhatsapp' from a github remote, the SHA1 (2eb5abe0) has not changed since last install. Use
force = TRUEto force installation
Hmm, can you try with [31/1/15 04:10:59]
? Cause that's the actual format (with the brackets)
Example: [25/1/15 12:34:23] Silvia Fernández: Va habÃa pensado o que comiésemos esta semana [25/1/15 12:34:37] Silvia Fernández: O también tengo tres entradas gratis para los bolos
Still works for me.
rwhatsapp::rwa_read(x = c(
"[25/1/15 12:34:23] Silvia Fernández: Va habÃa pensado o que comiésemos esta semana",
"[25/1/15 12:34:37] Silvia Fernández: O también tengo tres entradas gratis para los bolos"
))
#> # A tibble: 2 x 6
#> time author text source emoji emoji_name
#> <dttm> <fct> <chr> <chr> <lis> <list>
#> 1 2015-01-25 12:34:23 Silvia Fe… Va habÃa pensado o qu… text i… <NUL… <NULL>
#> 2 2015-01-25 12:34:37 Silvia Fe… O también tengo tres … text i… <NUL… <NULL>
I noticed though that the date format in your comment doesn't match the example. If you want to use a custom format, it would need to be this:
rwhatsapp::rwa_read(x = c(
"[25/1/15 12:34:23] Silvia Fernández: Va habÃa pensado o que comiésemos esta semana",
"[25/1/15 12:34:37] Silvia Fernández: O también tengo tres entradas gratis para los bolos"
), format = "dd/MM/yy HH:mm:ss")
#> # A tibble: 2 x 6
#> time author text source emoji emoji_name
#> <dttm> <fct> <chr> <chr> <lis> <list>
#> 1 2015-01-25 12:34:23 Silvia Fe… Va habÃa pensado o qu… text i… <NUL… <NULL>
#> 2 2015-01-25 12:34:37 Silvia Fe… O también tengo tres … text i… <NUL… <NULL>
Created on 2020-01-25 by the reprex package (v0.3.0)
After checking your sample file I found that the issue was with single-digit dates. These should work now as well:
rwhatsapp::rwa_read(c(
"[7/5/15, 22:35:22] Johannes Gruber: Was it good?",
"[8/5/15, 09:12:44] R: Yes, it was"
))
#> # A tibble: 2 x 6
#> time author text source emoji emoji_name
#> <dttm> <fct> <chr> <chr> <list> <list>
#> 1 2015-05-07 22:35:22 Johannes Gruber Was it good? text input <NULL> <NULL>
#> 2 2015-05-08 09:12:44 R Yes, it was text input <NULL> <NULL>
Created on 2020-01-26 by the reprex package (v0.3.0)
Works like a charm now!
Many thanks again for your time!
Hi Johannes!
It's Pablo, we exchanged emails some time ago regarding an issue I was facing with numbers I didn't have saved on my phone! I've faced two time formats which are not included in the package and had to be specify with the
format
parameter and that you may consider adding.format = "yyyy-MM-dd, HH:mm:ss"
Example: "2016-11-25, 18:50:48"format = "dd/MM/yy, HH:mm"
Example: "21/07/18 5:27 p. m." I didn't manage to read the 'a. m.' and 'p. m.' properly for this one.On the other hand, I've just updated to version 0.2.1 and I'm getting
Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rwhatsapp/help/rwhatsapp.rdb' is corrupt
when running?rwa_read
Congrats again on this great package! Good job!