JBGruber / rwhatsapp

An R package for working with WhatsApp data 💬
94 stars 19 forks source link

Time Conversion not working properly #22

Closed thatrix closed 3 years ago

thatrix commented 3 years ago

When I read a text file using the rwa_read, I get a warning telling me that the Time Conversion did not work properly. I have looked at the text file in detail and it doesn't seem different from other text files that I am working with. When imported, the file has 39882 observations of 9 variables. I'm not sure what I am doing incorrectly. A snip of the text file showing the dates is attached. Thank you. Text Snip

JBGruber commented 3 years ago

The lines I can see should be parsed correctly. However, that warning is shown when more than 10% of supplied times come out as NAs (which you probably want to solve). You could have at the messages where time conversion fails with:

library("rwhatsapp")
chat <- rwa_read(history)
chat[is.na(chat$time),]

# Or
library(dplyr)
chat %>% 
  filter(is.na(time))

Maybe you can search for the messages with NA time in your chat file, extract a few lines before and after the failing messages in your source text and post them here? I'm not interested in the text itself, so you can replace every word if you are privacy conscious. Just make sure everything that is out of the norm about these messages is preserved.

thatrix commented 3 years ago

Thanks very much. In the txt file, there isn't a single message that has the NA time. All messages have the correct time in them. It is only when I bring them into Whatsapp that I get that error. A sample of messages (all which show up as NA) is pasted below:

2020-09-22, 7:11 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog 2020-09-22, 7:11 a.m. - XXXXXX XXXXXXX: 2020-09-22, 7:11 a.m. - XXXXXX XXXXXXX: @xxxxxxxxxx 2020-09-22, 7:12 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog 2020-09-22, 7:12 a.m. - XXXXXX XXXXXXX: the quick brown fox jumps over the lazy dog 2020-09-22, 7:12 a.m. - +X (xxx) xxx-xxxx: 👏🏾👏🏾 2020-09-22, 7:12 a.m. - +X (xxx) xxx-xxxx: 😂 2020-09-22, 7:13 a.m. - +X (xxx) xxx-xxxx: Yeah. Lol 2020-09-22, 7:14 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog the quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dog 2020-09-22, 7:14 a.m. - +X (xxx) xxx-xxxx: 🤣👌🏾 2020-09-22, 7:15 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog the quick brown fox jumps over the lazy dog 🙃🙃 2020-09-22, 7:15 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog 2020-09-22, 7:15 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog 2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog 2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dog 2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: Ok. 2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: Where? 2020-09-22, 7:16 a.m. - +xx xxxx xxxxxx: You've lost me. 2020-09-22, 7:17 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog 2020-09-22, 7:17 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog

I have also attached a portion of what I get when I run the chat[is.na(chat$time),] command in R R Extract.txt

Thanks!

JBGruber commented 3 years ago

What you sent runs fine for me:

rwhatsapp::rwa_read(x = c(
  "2020-09-22, 7:11 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:11 a.m. - XXXXXX XXXXXXX: ",
  "2020-09-22, 7:11 a.m. - XXXXXX XXXXXXX: @xxxxxxxxxx",
  "2020-09-22, 7:12 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:12 a.m. - XXXXXX XXXXXXX: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:12 a.m. - +X (xxx) xxx-xxxx: 👏🏾👏🏾",
  "2020-09-22, 7:12 a.m. - +X (xxx) xxx-xxxx: 😂",
  "2020-09-22, 7:13 a.m. - +X (xxx) xxx-xxxx: Yeah. Lol",
  "2020-09-22, 7:14 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog",
  "the quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:14 a.m. - +X (xxx) xxx-xxxx: 🤣👌🏾",
  "2020-09-22, 7:15 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog",
  "the quick brown fox jumps over the lazy dog 🙃🙃",
  "2020-09-22, 7:15 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:15 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dogthe quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: Ok.",
  "2020-09-22, 7:16 a.m. - +X (xxx) xxx-xxxx: Where?",
  "2020-09-22, 7:16 a.m. - +xx xxxx xxxxxx: You've lost me.",
  "2020-09-22, 7:17 a.m. - +xx xxxx xxxxxx: the quick brown fox jumps over the lazy dog",
  "2020-09-22, 7:17 a.m. - +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog"
))
#> # A tibble: 20 x 6
#>    time                author    text                   source  emoji emoji_name
#>    <dttm>              <fct>     <chr>                  <chr>   <lis> <list>    
#>  1 2020-09-22 07:11:28 +X (xxx)… "the quick brown fox … text i… <NUL… <NULL>    
#>  2 2020-09-22 07:11:28 XXXXXX X… ""                     text i… <NUL… <NULL>    
#>  3 2020-09-22 07:11:28 XXXXXX X… "@xxxxxxxxxx"          text i… <NUL… <NULL>    
#>  4 2020-09-22 07:12:28 +X (xxx)… "the quick brown fox … text i… <NUL… <NULL>    
#>  5 2020-09-22 07:12:28 XXXXXX X… "the quick brown fox … text i… <NUL… <NULL>    
#>  6 2020-09-22 07:12:28 +X (xxx)… "👏🏾👏🏾"                 text i… <chr… <chr [2]> 
#>  7 2020-09-22 07:12:28 +X (xxx)… "😂"                   text i… <chr… <chr [1]> 
#>  8 2020-09-22 07:13:28 +X (xxx)… "Yeah. Lol"            text i… <NUL… <NULL>    
#>  9 2020-09-22 07:14:28 +xx xxxx… "the quick brown fox … text i… <NUL… <NULL>    
#> 10 2020-09-22 07:14:28 +X (xxx)… "🤣👌🏾"                 text i… <chr… <chr [2]> 
#> 11 2020-09-22 07:15:28 +xx xxxx… "the quick brown fox … text i… <chr… <chr [2]> 
#> 12 2020-09-22 07:15:28 +X (xxx)… "the quick brown fox … text i… <NUL… <NULL>    
#> 13 2020-09-22 07:15:28 +xx xxxx… "the quick brown fox … text i… <NUL… <NULL>    
#> 14 2020-09-22 07:16:28 +X (xxx)… "the quick brown fox … text i… <NUL… <NULL>    
#> 15 2020-09-22 07:16:28 +X (xxx)… "the quick brown fox … text i… <NUL… <NULL>    
#> 16 2020-09-22 07:16:28 +X (xxx)… "Ok."                  text i… <NUL… <NULL>    
#> 17 2020-09-22 07:16:28 +X (xxx)… "Where?"               text i… <NUL… <NULL>    
#> 18 2020-09-22 07:16:28 +xx xxxx… "You've lost me."      text i… <NUL… <NULL>    
#> 19 2020-09-22 07:17:28 +xx xxxx… "the quick brown fox … text i… <NUL… <NULL>    
#> 20 2020-09-22 07:17:28 +X (xxx)… "the quick brown fox … text i… <NUL… <NULL>

So it seems there is only an issue with some specific messages.

Thanks very much. In the txt file, there isn't a single message that has the NA time. All messages have the correct time in them. It is only when I bring them into Whatsapp that I get that error.

Yes, I got that. What I meant is that you should check the output from chat[is.na(chat$time),] and see which messages cause a problem. Then you should open the source file, look for those messages and copy them over.

So say, that chat[is.na(chat$time),] would return the message "- +X (xxx) xxx-xxxx: the quick brown fox jumps over the lazy dog". Then you could look for that portion in your txt file, copy the line before and after and paste it here. Ideally you could also check in R if that messages really causes the problem.

JBGruber commented 3 years ago

Since I haven't heard anything in a while, I'm closing this. Let me know if you need further help.