JBGruber / rwhatsapp

An R package for working with WhatsApp data 💬
94 stars 19 forks source link

Cuts off chat from 2020 for a specific group #19

Closed AmitLevinson closed 4 years ago

AmitLevinson commented 4 years ago

Hey, First of all - What an amazing package! One simple functions that does a fantastic job reading a chat file.

I have a text file with text from 2016-July 2020. The argument reads and parses all the chat messages until the last one for 2019 and then stops (doesn't throw an error, it just stops at the end of 2019).

When I tried splitting it up such as taking all the unread 2020 chat to a different file and read them separately it still didn't work and I get the following error:

> chat_2 <- rwa_read("chat2.txt")
Error: Input must be a vector, not NULL.
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
Unknown or uninitialised column: `emoji`. 
> rlang::last_error()
<error/vctrs_error_scalar_type>
Input must be a vector, not NULL.
Backtrace:
 1. rwhatsapp::rwa_read("chat2.txt")
 8. vctrs:::stop_scalar_type(.Primitive("quote")(NULL), "")
 9. vctrs:::stop_vctrs(msg, "vctrs_error_scalar_type", actual = x)

I also tried deleting the first few rows of 2020 thinking it was something there but it's plain chat. I will say that it works perfectly for a different group chat I have spanning before and after 2020. It also takes it longer to parse almost the same amount of messages compared to the second group that works for 2020.

Any suggestions?

Thanks!

JBGruber commented 4 years ago

Thanks, I'm happy you find the package useful!

I can't say I've seen anything like this before. So the first step would be to get a reproducible example to me.

Since you probably don't want to send me the whole file, I would recommend you copy the last few lines of 2019 and the first few of 2020 and paste them here as code. Probably you can just replace the text with jibberish as long as you leave the time stamps etc. as they appear in the file.

rwa_read accepts text vectors so you could test if the problem still occurs with your example before posting it.

AmitLevinson commented 4 years ago

Yes, sorry about that, I should have posted one earlier.

Here's a repex attached. After creating it I notice that it stops parsing from 2020 and instead adds it to the last row. This also occured in the original file.

Here's a link to a repex (fabricated information): chat2.txt

Edit: I'll also work on trying to get the same chat from a user with Whatsapp menus in English and see if that works.

AmitLevinson commented 4 years ago

Quick update: So it worked with the same group by exporting it through someone who has his Whatsapp in English. Not necessarily the messages themselves but the commands within the group such as 'XX added Y', or "XX joined via the invite group' etc. In the avbove repex these commands are in Hebrew. With that said, I wonder why it threw an error since these commands are common throughout the chat (before 2020), not only in the repex I added.

So for now it works, I'm just wondering if there's a way to bypass it for future parsing.

Thanks again for an amazing and elegant package!

JBGruber commented 4 years ago

The problem is, in fact, the date format. Datetimes like "2.1.2020, 9:51" were not recognised properly (as I set the regex too narrow). So multiple empty lines followed by dates that are not recognised as dates caused this hiccup.

Should work now (with the newest GitHub version).

By the way, I'm glad this all works in Hebrew as I never had the chance to test it! Thanks for reporting the issue :)

Please close after you confirm this works, thanks.

AmitLevinson commented 4 years ago

Yes! Works perfect and catches the system messages perfectly. Thanks for the quick fix.