TelegramTools / TLImporter

📲 Telegram Chat Importer: Import chats from WhatsApp or other services into Telegram
GNU Affero General Public License v3.0
127 stars 11 forks source link

Broken parsing of messages from WhatsApp #14

Closed cascading-jox closed 4 years ago

cascading-jox commented 4 years ago

I have some problems trying to use the linux application. I tried to use the application in 1:1 mode (not secret) and everything seemed good but apparently something went wrong with the parsing. The messages were at first structured in the right way but without a timestamp at the end, the [] were empty (even though I had the option turned on). Then after a couple of messages I got the raw output from the txt file in a single message. I do not have an example file at the moment (but I could produce a test file if necessary). I do not see something obviously wrong with the txt file. It is the standard WhatsApp format and the text contains some emojis. I would love to help but I am not sure what I could do in Python since all my text parsing skills are in awk.

ferferga commented 4 years ago

@cascading-jox Hey there! Sorry for the delay, ended up exams and only now I could take a look into TLImporter's issues.

I already had some reports of this. I pushed TLImpoter 3.0.7 which has some improvements in this area (if you're using common formats for the text messages it should work rightaway, otherwise you must adapt your file or send me a sample so I can take a look).

Would you mind testing that version? If everything goes right, please, close the issue.

cascading-jox commented 4 years ago

@ferferga No worries!

I tried it again and the messages looks correct now with 3.0.7. I noticed that some other things are not working as expected. The timestamp is in MM/DD/YY (same as the original txt file), maybe that can be configurable to DD/MM/YY?

I used the options:

1. Timestamps settings: Add Timestamps
    - Position: End of the message
2. Add hashtags to each message: Yes
4. Backup of the database in 'Saved Messages': Yes

but looking at the text messages, the timestamp is missing hashtags and name of the person. One person in the conversation has an emoji 🍇 in the name, I do not know if that makes any difference. I tried to create an example file to test on. Let me know if you need something else to test with.

ferferga commented 4 years ago

@cascading-jox As specified in the Importing chats section of the instructions, you must type the name of your partner as it appears in the text file.

I'm not going to add different format support: what you have in your file is what you get. Supporting every type of date format that exists is really difficult and, in order to support it properly, I probably need to include and additional library that handles dates, and I'm not willing to add more dependencies for such a trivial thing that probably nobody else needs. What TLImporter does is split the message on every ": " and match numbers before the name of the person (so, if you didn't specify the emoji, this will fail). Then, it removes all spaces and leaves date, name and message splitted and stored in different columns in the database. So I don't do any proper date recognition, I simply match numbers and non alpha characters behind the names.

Pull requests about this are welcome though. But I won't add more features to any of the Telegram Tools, I don't really need them anymore so all I will do now is maintenance updates, nothing new.

Also - what you say is a completely unrelated thing that should go into it's own issue. If you feel so, please, open a new one and attach screenshots of the resulting messages, a demo of the export and your settings. You provide the two latter, but not the resulting messages, so I don't understand what's wrong. Try first with the name of your partner + it's emoji.

I'm going to close this issue as the original one it's solved.