JBGruber / rwhatsapp

An R package for working with WhatsApp data đź’¬
95 stars 19 forks source link

Importing Multiple Files #24

Closed dkxmen closed 3 years ago

dkxmen commented 3 years ago

I am not sure if this is an issue, but I am trying to import multiple files using this package but I can't seem to get it right? Is there a way I can do that, even if it is using a for loop or lapply?

JBGruber commented 3 years ago

~Indeed, I did not think about that. In my mind, people want to keep analysis of chats separated so there is no need to read multiple files. Could you tell me a little about your use case to help me understand if this would be a valuable addition? In the meantime, this would work:~

Sorry, it seems I've finally forgotten how my own software works. This already works out of the box:

library(rwhatsapp)
# get some files as a simple example
history <- system.file("extdata", "sample.txt", package = "rwhatsapp")
files <- rep(history, 5)
files
#> [1] "/home/johannes/R/x86_64-pc-linux-gnu-library/4.1/rwhatsapp/extdata/sample.txt"
#> [2] "/home/johannes/R/x86_64-pc-linux-gnu-library/4.1/rwhatsapp/extdata/sample.txt"
#> [3] "/home/johannes/R/x86_64-pc-linux-gnu-library/4.1/rwhatsapp/extdata/sample.txt"
#> [4] "/home/johannes/R/x86_64-pc-linux-gnu-library/4.1/rwhatsapp/extdata/sample.txt"
#> [5] "/home/johannes/R/x86_64-pc-linux-gnu-library/4.1/rwhatsapp/extdata/sample.txt"

# just read in all the parts
df <- rwa_read(files)
head(df)
#> # A tibble: 6 Ă— 6
#>   time                author          text         source       emoji emoji_name
#>   <dttm>              <fct>           <chr>        <chr>        <lis> <list>    
#> 1 2017-07-12 22:35:38 <NA>            "Messages t… /home/johan… <NUL… <NULL>    
#> 2 2017-07-12 22:35:38 <NA>            "You create… /home/johan… <NUL… <NULL>    
#> 3 2017-07-12 22:35:38 Johannes Gruber "<Media omi… /home/johan… <NUL… <NULL>    
#> 4 2017-07-12 22:35:38 Johannes Gruber "Fruit brea… /home/johan… <chr… <chr [2]> 
#> 5 2017-07-13 09:12:38 Test            "It's fun d… /home/johan… <NUL… <NULL>    
#> 6 2017-07-13 09:16:38 Johannes Gruber "Haha it su… /home/johan… <chr… <chr [1]>

Created on 2021-08-05 by the reprex package (v2.0.0)

If you experience problems, maybe you can share a bit more about what you are doing?

dkxmen commented 3 years ago

Thanks a lot. The above sample works very well.

My case was where let say for example, you have several (different) family whatsapp group. Let say 10. But you need to analyse all the groups as one. So, I needed a way I would upload multiple whatsapp groups text files and read all of them once. The desired result would be one dataframe for the 10 whatsapp groups. The code above does that, question is, does it work the same way in Shiny.

And now really pushing it, since this is now possible, is it possible to identify a message with the group it came from? i.e. When you export data from whatsapp, the text file usually have a very particular pattern, "WhatsApp Chat with Cousins", with "Cousins" being the group name. Now, is it possible to create an option that when reading a whatsapp text file, it can pick the group name and put it in a column let's say called: "source" or "group"?

JBGruber commented 3 years ago

Well, there already is a source column :smiley:.

So if you haven't renamed your files, this should work:

library(rwhatsapp)
df <- rwa_read("/home/johannes/WhatsApp Chat with flatmates.txt")

library(tidyverse)
df %>% 
  mutate(group = str_extract(source, "(?<=WhatsApp Chat with ).*?(?=.txt)")) %>% 
  select(group)
#> # A tibble: 16,817 Ă— 1
#>    group    
#>    <chr>    
#>  1 flatmates
#>  2 flatmates
#>  3 flatmates
#>  4 flatmates
#>  5 flatmates
#>  6 flatmates
#>  7 flatmates
#>  8 flatmates
#>  9 flatmates
#> 10 flatmates
#> # … with 16,807 more rows

Created on 2021-08-07 by the reprex package (v2.0.0)

The regular expression here means to look for anything between WhatsApp Chat with and .txt.

dkxmen commented 3 years ago

Yes, that solves my querry, my issue was with shiny since it renames the files once uploaded, for example, on Shiny, the source column names the file "C:\Users\deniw\AppData\Local\Temp\RtmpSOqUzv/4bd869ed648b265df4ca4b5b/0.txt", but on normal R the source column names the file "data/WhatsApp Chat with Cousins.txt". I will have to work around that issue in Shiny but it is working perfectly on a normal R script. Thanks for your great assistance. I think I can close the issue now. Thanks again.