JBGruber / rwhatsapp

An R package for working with WhatsApp data 💬
94 stars 19 forks source link

Add new time formats #12

Closed pcontrerasnun closed 4 years ago

pcontrerasnun commented 4 years ago

Hi Johannes!

It's Pablo, we exchanged emails some time ago regarding an issue I was facing with numbers I didn't have saved on my phone! I've faced two time formats which are not included in the package and had to be specify with the format parameter and that you may consider adding.

format = "yyyy-MM-dd, HH:mm:ss" Example: "2016-11-25, 18:50:48" format = "dd/MM/yy, HH:mm" Example: "21/07/18 5:27 p. m." I didn't manage to read the 'a. m.' and 'p. m.' properly for this one.

On the other hand, I've just updated to version 0.2.1 and I'm getting Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rwhatsapp/help/rwhatsapp.rdb' is corrupt when running ?rwa_read

Congrats again on this great package! Good job!

JBGruber commented 4 years ago

Thanks Pablo! It's amazing how many different formats are out there. I added "yyyy-MM-dd, HH:mm:ss" to the supported formats (use remotes::install_github("JBGruber/rwhatsapp") to get the new version).

I'm not really sure about the second one. "21/07/18 5:27 p. m." doesn't look quite normal. Are you sure that is exactly how WhatsApp provides the times? "21/07/18 5:27 PM" is already supported. I don't think there is a format that supports "p. m." so the only option would be to replace that with "PM". Could you send me an (extract of) the file you encountered this in for testing?

The lazy-load database error can usually be solved by restarting RStudio.

pcontrerasnun commented 4 years ago

Thanks for the quick response and fix Johannes! It's working!

Here is an extract of the file:

21/07/18 5:27 p. m. - Los mensajes y llamadas en este chat ahora están protegidos con cifrado de extremo a extremo. Toca para más información.
21/07/18 5:27 p. m. - Pablo Contreras: Este es el movil aus?
21/07/18 10:40 p. m. - Andres Garza: <Multimedia omitido>
22/07/18 3:13 a. m. - Andres Garza: Eres tonto o que mostroooo

I've asked my friend to send me another chat just to make sure that this format is consistent along his chats. I'll let you know, cause I agree with you that this is a very weird format and as I said before I couldn't manage to find a format to parse it correctly.

pcontrerasnun commented 4 years ago

Hi Johannes!

It used to work fine before but now when the date format is "yyyy/M/dd HH:mm:ss" Example: 31/1/15 14:10:59 it gets parse to 0015-01-31 (should be 2015-01-31)

JBGruber commented 4 years ago

Hmm. I don't have any problems with that last format:

rwhatsapp::rwa_read(x = c(
  "31/1/15 04:10:59 - Johannes Gruber: Was it good?",
  "31/1/15 14:10:59 - R: Yes, it was"
))
#> # A tibble: 2 x 6
#>   time                author          text         source     emoji  emoji_name
#>   <dttm>              <fct>           <chr>        <chr>      <list> <list>    
#> 1 2015-01-31 04:10:59 Johannes Gruber Was it good? text input <NULL> <NULL>    
#> 2 2015-01-31 14:10:59 R               Yes, it was  text input <NULL> <NULL>

Created on 2020-01-25 by the reprex package (v0.3.0)

Are you sure you are using the newest version?

I looked into the other problem as well. You can include something like the following into your code to work with that format. I'm a little hesitant to include it in the package at this point since I'm not sure if it has any other implications but I'll think about it.

Let's say you read the txt file into R with readLines() and called the vector lines. Then this should work:

lines <- c(
"21/07/18 5:27 p. m. - Los mensajes y llamadas en este chat ahora están protegidos con cifrado de extremo a extremo. Toca para más información.",
"21/07/18 5:27 p. m. - Pablo Contreras: Este es el movil aus?",
"21/07/18 10:40 p. m. - Andres Garza: <Multimedia omitido>",
"22/07/18 3:13 a. m. - Andres Garza: Eres tonto o que mostroooo"
)

lines <- stringi::stri_replace_all_regex(
  str = lines,
  pattern = c("(?<=\\d{2}) p. m. -", "(?<=\\d{2}) a. m. -"), 
  replacement = c(" PM -", " AM -"),
  vectorize_all = FALSE
)

rwhatsapp::rwa_read(lines)
#> # A tibble: 4 x 6
#>   time                author    text                    source  emoji emoji_name
#>   <dttm>              <fct>     <chr>                   <chr>   <lis> <list>    
#> 1 2018-07-21 17:27:56 <NA>      Los mensajes y llamada… text i… <NUL… <NULL>    
#> 2 2018-07-21 17:27:56 Pablo Co… Este es el movil aus?   text i… <NUL… <NULL>    
#> 3 2018-07-21 22:40:56 Andres G… <Multimedia omitido>    text i… <NUL… <NULL>    
#> 4 2018-07-22 03:13:56 Andres G… Eres tonto o que mostr… text i… <NUL… <NULL>

Created on 2020-01-25 by the reprex package (v0.3.0)

pcontrerasnun commented 4 years ago

Thanks for the tip for the a. m. and p. m. issue!

Yup, I'm using the last version: remotes::install_github("JBGruber/rwhatsapp") Skipping install of 'rwhatsapp' from a github remote, the SHA1 (2eb5abe0) has not changed since last install. Useforce = TRUEto force installation

Hmm, can you try with [31/1/15 04:10:59]? Cause that's the actual format (with the brackets) Example: [25/1/15 12:34:23] Silvia Fernández: Va había pensado o que comiésemos esta semana [25/1/15 12:34:37] Silvia Fernández: O también tengo tres entradas gratis para los bolos

JBGruber commented 4 years ago

Still works for me.

rwhatsapp::rwa_read(x = c(
  "[25/1/15 12:34:23] Silvia Fernández: Va había pensado o que comiésemos esta semana",
  "[25/1/15 12:34:37] Silvia Fernández: O también tengo tres entradas gratis para los bolos"
))
#> # A tibble: 2 x 6
#>   time                author     text                   source  emoji emoji_name
#>   <dttm>              <fct>      <chr>                  <chr>   <lis> <list>    
#> 1 2015-01-25 12:34:23 Silvia Fe… Va había pensado o qu… text i… <NUL… <NULL>    
#> 2 2015-01-25 12:34:37 Silvia Fe… O también tengo tres … text i… <NUL… <NULL>

I noticed though that the date format in your comment doesn't match the example. If you want to use a custom format, it would need to be this:

rwhatsapp::rwa_read(x = c(
  "[25/1/15 12:34:23] Silvia Fernández: Va había pensado o que comiésemos esta semana",
  "[25/1/15 12:34:37] Silvia Fernández: O también tengo tres entradas gratis para los bolos"
), format = "dd/MM/yy HH:mm:ss")
#> # A tibble: 2 x 6
#>   time                author     text                   source  emoji emoji_name
#>   <dttm>              <fct>      <chr>                  <chr>   <lis> <list>    
#> 1 2015-01-25 12:34:23 Silvia Fe… Va había pensado o qu… text i… <NUL… <NULL>    
#> 2 2015-01-25 12:34:37 Silvia Fe… O también tengo tres … text i… <NUL… <NULL>

Created on 2020-01-25 by the reprex package (v0.3.0)

JBGruber commented 4 years ago

After checking your sample file I found that the issue was with single-digit dates. These should work now as well:

rwhatsapp::rwa_read(c(
  "[7/5/15, 22:35:22] Johannes Gruber: Was it good?",
  "[8/5/15, 09:12:44] R: Yes, it was"
))
#> # A tibble: 2 x 6
#>   time                author          text         source     emoji  emoji_name
#>   <dttm>              <fct>           <chr>        <chr>      <list> <list>    
#> 1 2015-05-07 22:35:22 Johannes Gruber Was it good? text input <NULL> <NULL>    
#> 2 2015-05-08 09:12:44 R               Yes, it was  text input <NULL> <NULL>

Created on 2020-01-26 by the reprex package (v0.3.0)

pcontrerasnun commented 4 years ago

Works like a charm now!

Many thanks again for your time!