JBGruber / rwhatsapp

An R package for working with WhatsApp data 💬
95 stars 19 forks source link

Problem with emoji #30

Closed armelsoubeiga closed 2 years ago

armelsoubeiga commented 2 years ago

Hello @JBGruber

I have a problem when I try to read my whatsapp data. The problem has already been asked in issue #29 but without solutions.

x <- "D:/Dev/newwhat/data/Chat - Stat-Inf_ Job&Scholarship.txt"
rwa_read(x)

Error in split.default(lookup$emoji, lookup$id) : first argument must be a vector In addition: Warning messages: 1: Unknown or uninitialised column: emoji. 2: Unknown or uninitialised column: emoji.

This is the data : Chat - Stat-Inf_ Job&Scholarship.txt

Thanks

JBGruber commented 2 years ago

Thanks for bringing this up! I never heard back from @msulyok, so I don't know what caused the other bug. I just fixed your problem in 85563a4 though.

This chat completely broke my assumption on how people use WhatsApp! I had a rule in the code to re-evaluate the timestamps if more than 50% of the lines do not contain a valid time. However, more than half the messages in this chat span over multiple lines, which means they have no timestamp! I increased the threshold to 90%. Hope this doesn't cause trouble elsewhere.

You can install the development version and read in that chat without issues.

armelsoubeiga commented 2 years ago

Hi @JBGruber

I tested with the development version and it's works fine now. I'm not having any problems elsewhere either. Thanks

GiuPolli commented 10 months ago

Same error showing now:

Error in split.default(lookup$emoji, lookup$id) : primeiro argumento deve ser um vetor Além disso: Warning messages: 1: Unknown or uninitialised column: emoji. 2: Unknown or uninitialised column: emoji.

Happening to those two chats Arena Divulgacao.txt Mentoria Imperium.txt

JBGruber commented 10 months ago

Hmm. I can't reproduce your issue.

curl::curl_download("https://github.com/JBGruber/rwhatsapp/files/13863974/Arena.Divulgacao.txt", "hist.txt")
df <- rwhatsapp::rwa_read(x = "hist.txt")
df
#> # A tibble: 1,235 × 6
#>    time                author                     text  source emoji  emoji_name
#>    <dttm>              <fct>                      <chr> <chr>  <list> <list>    
#>  1 2023-09-09 13:08:12 📣 Arena Divulgação - Imp… "‎As … hist.… <NULL> <NULL>    
#>  2 2023-09-09 13:08:12 ~ Matheus Mendonça         "‎~ M… hist.… <NULL> <NULL>    
#>  3 2023-10-24 20:08:07 📣 Arena Divulgação - Imp… "‎Voc… hist.… <NULL> <NULL>    
#>  4 2023-10-25 10:27:19 ~ Aguinaldo Santos         "‎~ E… hist.… <NULL> <NULL>    
#>  5 2023-10-25 10:32:52 ~ Priscilla Oliveira       "‎~ E… hist.… <NULL> <NULL>    
#>  6 2023-10-25 12:17:52 ~ Morena                   "Cur… hist.… <chr>  <chr [1]> 
#>  7 2023-10-25 12:18:19 ~ Morena                   "Que… hist.… <chr>  <chr [1]> 
#>  8 2023-10-25 12:18:37 ~ Morena                   "‎Men… hist.… <NULL> <NULL>    
#>  9 2023-10-25 12:28:54 ~ Caio Teixeira            "‎<an… hist.… <NULL> <NULL>    
#> 10 2023-10-25 12:29:27 ~ Caio Teixeira            "Fei… hist.… <chr>  <chr [4]> 
#> # ℹ 1,225 more rows

curl::curl_download("https://github.com/JBGruber/rwhatsapp/files/13863975/Mentoria.Imperium.txt", "hist2.txt")
df2 <- rwhatsapp::rwa_read(x = "hist2.txt")
df2
#> # A tibble: 5,805 × 6
#>    time                author                  text     source emoji  emoji_name
#>    <dttm>              <fct>                   <chr>    <chr>  <list> <list>    
#>  1 2022-08-24 11:35:46 Mentoria Imperium       "‎As men… hist2… <NULL> <NULL>    
#>  2 2022-08-24 11:35:46 ~ Murilo V Marques      "‎~ Muri… hist2… <NULL> <NULL>    
#>  3 2023-09-11 19:26:33 ~ Matheus Mendonça      "‎~ Math… hist2… <NULL> <NULL>    
#>  4 2023-09-11 19:36:00 ~ Dra. Luciane Sippert  "‎figuri… hist2… <NULL> <NULL>    
#>  5 2023-09-11 19:36:01 ~ Dra. Luciane Sippert  "Olá,  … hist2… <NULL> <NULL>    
#>  6 2023-09-11 19:38:01 ~ Renata Calil          "Bemmmm… hist2… <NULL> <NULL>    
#>  7 2023-09-11 20:21:18 ~ Eliana Coco Psicóloga "‎imagem… hist2… <NULL> <NULL>    
#>  8 2023-09-11 20:41:06 ~ Joyce Scoto  Advogada "Hj não… hist2… <chr>  <chr [2]> 
#>  9 2023-09-11 21:39:42 ~ Clarice               "Obriga… hist2… <chr>  <chr [2]> 
#> 10 2023-09-11 21:50:09 ~ Carmen Lydia de Marco "Parabé… hist2… <NULL> <NULL>    
#> # ℹ 5,795 more rows

Created on 2024-01-08 with reprex v2.0.2

Can you maybe use reprex to show the error and session_info() (see below).

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os EndeavourOS #> system x86_64, linux-gnu #> ui X11 #> language #> collate en_GB.UTF-8 #> ctype en_GB.UTF-8 #> tz Europe/Berlin #> date 2024-01-08 #> pandoc 3.1.11 @ /usr/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> curl 5.2.0 2023-12-08 [1] CRAN (R 4.3.2) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.2) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.2) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2) #> rwhatsapp 0.2.4.9000 2024-01-08 [1] local #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2) #> stringi 1.8.3 2023-12-11 [2] CRAN (R 4.3.2) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.2) #> #> [1] /home/johannes/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
GiuPolli commented 10 months ago

Sorry, but Idk how to comment using those chunks like you do.

I managed to use reprex and get the preview in my clippboard but I couldn't bring it in that format, I also cound't find how to in the internet.

So I just pasted here:

curl::curl_download("https://github.com/JBGruber/rwhatsapp/files/13863974/Arena.Divulgacao.txt", "hist.txt") df <- rwhatsapp::rwa_read(x = "hist.txt")

> Warning: Unknown or uninitialised column: emoji.

> Unknown or uninitialised column: emoji.

> Error in split.default(lookup$emoji, lookup$id): primeiro argumento deve ser um vetor

Created on 2024-01-09 with reprex v2.0.2

Session info sessioninfo::session_info()

> - Session info ---------------------------------------------------------------

> setting value

> version R version 4.1.2 (2021-11-01)

> os Windows 10 x64 (build 22621)

> system x86_64, mingw32

> ui RTerm

> language (EN)

> collate Portuguese_Brazil.1252

> ctype Portuguese_Brazil.1252

> tz America/Sao_Paulo

> date 2024-01-09

> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

>

> - Packages -------------------------------------------------------------------

> package * version date (UTC) lib source

> cli 3.6.1 2023-03-23 [1] CRAN (R 4.1.3)

> curl 5.0.0 2023-01-12 [1] CRAN (R 4.1.3)

> digest 0.6.31 2022-12-11 [1] CRAN (R 4.1.3)

> evaluate 0.23 2023-11-01 [1] CRAN (R 4.1.2)

> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.1.2)

> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.1.3)

> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.3)

> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.3)

> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.1.3)

> knitr 1.45 2023-10-30 [1] CRAN (R 4.1.2)

> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.1.2)

> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.3)

> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.1.3)

> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)

> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.1.3)

> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.1.3)

> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.1.2)

> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.1.2)

> rwhatsapp 0.2.4 2022-01-05 [1] CRAN (R 4.1.3)

> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)

> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.1.2)

> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.1.3)

> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.1.2)

> vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.1.3)

> withr 2.5.2 2023-10-30 [1] CRAN (R 4.1.2)

> xfun 0.39 2023-04-20 [1] CRAN (R 4.1.3)

> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.1.3)

>

> [1] C:/Users/ACE.ACESP-103/Documents/R/win-library/4.1

> [2] C:/Program Files/R/R-4.1.2/library

>

> ------------------------------------------------------------------------------