TelegramTools / TLImporter

📲 Telegram Chat Importer: Import chats from WhatsApp or other services into Telegram
GNU Affero General Public License v3.0
127 stars 11 forks source link

The app crashes as soon as I type the name of the other partner #2

Closed expl0r3rgu1 closed 4 years ago

expl0r3rgu1 commented 5 years ago

TLImporter-log.log

ferferga commented 5 years ago

@expl0r3rgu1 The log is useless. What's the error you are receiving in the screen?

expl0r3rgu1 commented 5 years ago

I understood the problem the txt file of the chat is too big. What's the maximum dimension the file can have?

expl0r3rgu1 commented 5 years ago

My chat file is 82 KB big. If I try with just a few lines of the chat it does work. So I need the max dimension the file can have to divide the chat into multiple smaller files

expl0r3rgu1 commented 5 years ago

I can't see the specific error 'cause the app instantly close.

ferferga commented 5 years ago

@expl0r3rgu1 That's not a problem, probably it's a strange non-UTF8 character around. I tried with 20 MB files as far I can recall.

To see the error, open cmd.exe (Win+R and type cmd.exe) and drag and drop the TLImporter executable there. Or type the full path. That will leave the window open after the program closes. Attach an screenshot, please.

expl0r3rgu1 commented 5 years ago

`This file is valid to be imported. It has 37998 lines in total.

Giulia has 19617 messages. Toto has 18381 messages.

Processing and saving messages in the database... N/A% (0 of 37998) | | Elapsed Time: 0:00:00 ETA: --:--:--Traceback (most recent call last): File "TLImporter.py", line 1077, in File "TLImporter.py", line 644, in DumpDB ValueError: not enough values to unpack (expected 2, got 1) [11168] Failed to execute script TLImporter`

expl0r3rgu1 commented 5 years ago

This is what I got

expl0r3rgu1 commented 5 years ago

@ferferga If it can be useful, in this chat there are emoticons and Chinese characters

ferferga commented 5 years ago

@expl0r3rgu1 Emoji should not be a problem, as they are part of Unicode and I tested them. But can't say the same for Chinese characters (I tried most strange characters but those).

I'm outside home right now so I can't take a deeper look at it. It seems, though, that this might be an issue in my end and in your file. You just remembered me that I did not introduce any encoding checking (just name checking) where the app says "The file is valid". Thus, you might be importing an ASCII file instead of an UTF-8 one, the encoding used by Python and that has most compatibilities between characters.

In the meantime, could you, please, open your file in notepad and go to File > Save As and save the contents of the file into a new file by making sure you choose UTF-8 as encoding?

expl0r3rgu1 commented 5 years ago

Yes, I'll save the file with UTF-8 encoding

expl0r3rgu1 commented 5 years ago

I did a test with the new UTF-8 encoded file and the result is exactly the same. I've also checked that the file of the previous test was UTF-8 encoded

expl0r3rgu1 commented 5 years ago

The real mystery is why with just 5 lines of the chat it works and with the whole chat it doesn't

ferferga commented 5 years ago

@expl0r3rgu1 Before I start investigating this any further: do you know exactly the line where it stops then? Can you check if that line has an strange character on it or it's a WhatsApp system message? (Like "your partner's changed encryption keys" and so on). Those system messages from WhatsApp were only tested in Spanish and a little bit (not much, and before the rewrite to Python) in English. Worth checking so we get around knowing exactly what's causing the problem here.

If you know already the line that's causing the issue, we already have a lot of it solved.

expl0r3rgu1 commented 5 years ago

I noticed that there are some messages regarding the decryption keys. Do I have to delete these messages? How can I see where the app stops working?

expl0r3rgu1 commented 5 years ago

I've just tried to remove the Whatsapp System messages but the result it's the same

ferferga commented 5 years ago

@expl0r3rgu1 The app should handle those. I will add proper error logging to see where the problem is + a deep inspection on how the text analysis is done, and pass you a testing version with the changes. Do you have Python installed in your system + knowledge of virtualenvs and packages? Or I need to build you an executable?

expl0r3rgu1 commented 5 years ago

N/A% (0 of 37998) | | Elapsed Time: 0:00:00 ETA: --:--:--Traceback (most recent call last): In this line, you can understand that it stops at the first line of the chat With the first message

expl0r3rgu1 commented 5 years ago

I've python installed. I'm a programmer in the mobile world, probably I can manage this task. If I have problems I'll write to you here

ferferga commented 5 years ago

@expl0r3rgu1 Well, its simpler and it doesn't take so much time for me. I don't mind really. But you are using Windows right?

expl0r3rgu1 commented 5 years ago

Yes, I am

ferferga commented 5 years ago

@expl0r3rgu1 It seems that the problem is in the header of each message. Can you, please, attach the header of some messages (don't attach the whole message, just the timestamp and name of the user who sent the message)?

If you see https://github.com/TelegramTools/TLImporter/blob/python/samples/WhatsApp_Chat_Diego_Vel%C3%A1zquez.txt that's the format that I developed the app for. I don't know if WhatsApp changed this (or different platforms output different files).

expl0r3rgu1 commented 5 years ago

This is the problem! 17/01/19, 16:05 - Giulia: Now this is the format

ferferga commented 5 years ago

This is the problem! 17/01/19, 16:05 - Giulia: Now this is the format

Could you please remove the comma meanwhile and replace the hyphen (you can use regex in notepad++)? I will try to add some checks to solve this whenever I have some time, but I would like to know that this is the exact problem and there are no more issues.

Thank you for the time you took investigating this :)

expl0r3rgu1 commented 5 years ago

I'll do it! I am the one who has to thank you 'cause I really need to import my WhatsApp chat in Telegram c:

expl0r3rgu1 commented 5 years ago

Do I have to type it in this way? "17/01/19 - 16:05 - Giulia"

expl0r3rgu1 commented 5 years ago

I don't know how you implemented the part that collects data from the chat but I noticed that the differences between my chat and your example chat are the comma but also the time and the year. Maybe also this can be problematic.

expl0r3rgu1 commented 5 years ago

I can modify the year from 19 to 2019 and add the seconds at the time (Just a random second to satisfy your algorithm)

expl0r3rgu1 commented 5 years ago

I can modify the year from 19 to 2019 and add the seconds at the time (Just a random second to satisfy your algorithm)

I'm modifying the chat as to be like your example 17/01/2019 16:05:02: Giulia: Message This is how the chat will look like

expl0r3rgu1 commented 5 years ago

It's working! Thank you so much for your help!

expl0r3rgu1 commented 5 years ago

In order to close this issue I'll say the solution: the format of the WhatsApp exported chat has changed so before a probable update of the program to satisfy these changes, you are the one who has to satisfy the program requirements modifying the chat as to be like the one of this example: https://github.com/TelegramTools/TLImporter/blob/python/samples/WhatsApp_Chat_Diego_Vel%C3%A1zquez.txt

ferferga commented 5 years ago

I will re-open it if you don't mind so I remember fixing this and I give more visibility to this problem :).

ferferga commented 5 years ago

@expl0r3rgu1 So it seems that the timestamp format changes between langs as well. Could you describe how/which regex did you use to change the header for future reference until this issue gets fixed? cc to #4

Skalyx14 commented 5 years ago

Yes would be great to get something to convert unsupported Whatsapp TXT to supported ones. :)

ferferga commented 4 years ago

All the problems with this have been fixed in 3.0.3 (I hope)

Finally the problem that the program was showing was not all caused by the format of the exported chat, it was a fault in the whole logic, which I don't know why, sometimes was triggered by some users, sometimes don't. However, although it wasn't caused by it, now TLImporter recognises both the Old and the New format.

Check out the releases tab for more.

Thank you very much for reporting this but and all the patience you all had until I had the time and motivation to fix this :)