JBGruber / rwhatsapp

An R package for working with WhatsApp data 💬
94 stars 19 forks source link

Datetime doesn't parse #13

Closed sdrakulich closed 4 years ago

sdrakulich commented 4 years ago

Hi there,

myfile='myfile.txt'
Timestamp.Format <- "%Y-%m-%d, %I:%M %p"
chat <- rwa_read(myfile, tz="Canada/Eastern", format=Timestamp.Format)

I'm feeding the raw whatsapp text file, of course, and ending up with the time column as NA. I've verified that my timestamp format is correct, based on other parsers succeeding for this part. They end up with different linecounts, so I can't just simply join them.

I'm pretty sure that out of all the parsers, yours gets the highest accuracy with respect to properly pulling message text, sender, and not failing due to messages with carriage returns. Just need the time to work!

Thanks!

JBGruber commented 4 years ago

Hi @sdrakulich,

The problem you encouter is because rwa_read uses stringi::stri_datetime_parse under the hood. This doesn't use the standard R format options but the set from the C++ ICU library (see ?stringi::stri_datetime_parse). In your case I think this should be "yyyy-MM-dd, hh:mm a":

rwhatsapp::rwa_read(c(
  "2018-07-12, 11:35 AM - Johannes Gruber: Was it good?",
  "2018-07-12, 11:35 PM - R: Yes, it was"
), tz = "Canada/Eastern", format = "yyyy-MM-dd, hh:mm a")
#> # A tibble: 2 x 6
#>   time                author          text         source     emoji  emoji_name
#>   <dttm>              <fct>           <chr>        <chr>      <list> <list>    
#> 1 2018-07-12 11:35:38 Johannes Gruber Was it good? text input <NULL> <NULL>    
#> 2 2018-07-12 23:35:38 R               Yes, it was  text input <NULL> <NULL>

Created on 2020-02-05 by the reprex package (v0.3.0)

sdrakulich commented 4 years ago

Yup, my apologies for not paying closer attention! Thanks a ton :)