Ads97 / WhatsApp-Llama

Finetune a LLM to speak like you based on your WhatsApp Conversations
Other
339 stars 10 forks source link

better regex to cater to all texts #3

Open ParmuSingh opened 5 months ago

ParmuSingh commented 5 months ago

What does this PR do?

This PR changes the regex to better handle removing timestamps while preprocessing data. The current file isn't able to handle latest export format for whatsapp chats.

Fixes # (issue)

This regex has been made to cater to how whatsapp exports chats for Indian users. This may be the same for everyone.