TelegramTools / TLImporter

📲 Telegram Chat Importer: Import chats from WhatsApp or other services into Telegram
GNU Affero General Public License v3.0
127 stars 11 forks source link

FATAL ERROR WHILE SETTING UP #11

Closed Evilamblonyx closed 4 years ago

Evilamblonyx commented 4 years ago

Hi, I'm getting an error while importing the txt file. Telegram gives a success message after the process, but no massages are imported. I attached the command window outputs and the logfile (both with some details like phone numbers partially replaced with ****). Am I doing it wrong or is this a bug? commandwindow.txt TLImporter-log.log

ferferga commented 4 years ago

@Evilamblonyx Which format is used in the source .txt file? Check the samples here.

It should work with any format though, so this is likely a bug.

Evilamblonyx commented 4 years ago

You can find a short example from the input file in the attachment. It doesn't look exactly like the example. It's exported from the german version of WhatsApp. example_input.txt

ferferga commented 4 years ago

@Evilamblonyx Could you try adapting the format to this? [04.03.20, 10:43:02] NAME:

What a mess is the WhatsApp export chat. I guess I'll need to make TLImporter even less strict with the data formats.

Sorry for the late reply btw. I'm full of exams right now and I wanted to check all the ways possible to get around this, but seems to me that, either you adapt it to the sample format, or to the one I proposed here.

Will try to get back to this and fix all the long standing issues as soon as I can.

Evilamblonyx commented 4 years ago

@ferferga Thank you for your answer. Don't mind your reply time. This is just a hobby, and we all have other jobs. So thank you for all your usefull work, you put into this tool. I think it's not possible to change the output format of Whatsapp. And correcting the exported file manually is far to much work. But I might be able to correct the file with a script. I'm not super skilled as a programmer. I can do a bit C# and C, but there is no quick and easy way to correct the textfile that I know about. Is there any tool or skriptlanguage, you would suggest for this job? I'm willing to work into a new tool, if I can do this job with it.

ferferga commented 4 years ago

No need to write your own, simply use Notepad++! Will save you a lot of time :)

Head over Edit > Find and Replace and switch to regex mode. This is the only regex that someone posted. Got no feedback about how it worked and sincerely not sure if it will work with yours, as the format I got from WP is different from yours.

About how to change the export format, yes, there is no way in WhatsApp, the only way is to modify it afterwards using regexes.

Additionally, if that didn't work, I suggest you to modify TLImporter's code wher the parsing is done. I know the current implementation is very dirty (the whole code is really dirty in reality), but I wrote it when I was a noob in many aspects :):

Split the line in the ":" (only once, we don't want to split messages that contains ":" inside them) and check if in the first item in the array the name of the person is there. That string will be the header, while the other one will be the text. This should get the job done real quick (although I don't known how it will behave with system messages)

If that works for you and you don't see any quirk, please, make a PR for it, I would be very grateful. I wanted to have a good regex to sanitize things, but maybe I should have forget that and do it the simple way :)

Evilamblonyx commented 4 years ago

Oh, I'm using Notepad++ for years now, but I never used this feature. I'll give it a try and give you feedback afterwards. I don't feel skilled enough to change stuff in the TLImporter code. So I'll try the easy way first. Thank you for your support.

Evilamblonyx commented 4 years ago

Ok, I managed to change the Date and Time fomat with Notepad++ and the folowing regex: ^([0-3][0-9].[0-1][0-9].[0-2][0-9], [0-2][0-9]:[0-5][0-9])( - )(.*:) -> [\1:01] \3 It might not be the most elegant regex for this job, but it worked.

But, there is another problem: The Messages are not importet properly: It seems to me, that TLImporter expects messages to have no linebreaks - is this correct? If I remove all linebreaks from my example, then it works properly. Is There a way to import linebreaks? My example with and without linebreaks is attached. And also the databasefile that is g enerated, when I try to import the file with linebreaks.

linebreaks.txt nolinebreaks.txt

ferferga commented 4 years ago

@Evilamblonyx Now that I'm updating the whole family of Telegram Tools (TLMerger, TLSecret and TLImporter are the ones that I've just finished digging), I think that the problem with the multiline lines was introduced in #8, because I recall having multiple chats imported with multiline just fine (and, although the code is complete dirty everywhere, the current approach doesn't make much sense. I'm going to rewrite it entirely, so it should be fixed for next version. Would you mind testing it there? It should not take so much time, I just want to update all the apps at once, because all of them rely on the new secret mode.

By the way, I'm also going to finally make TLRevert, so you will be able to delete with it all the messages that were imported by the previous TLImporter version and start from scratch with the new one.

By the way, I also removed your database from your comment because it contained your phone number.

Evilamblonyx commented 4 years ago

@ferferga Thanks that you are still working on this Project. I think it's a important tool to get people to migrate from Whatsapp to Telegramm. I'm willing to do tests with multiline chats. Do you want me to try out an older version, to make sure the problem was induced in #8 or should I wailt for you to send me the new version?

Thank you vor removing the Database, totally forgot about the phone number.

ferferga commented 4 years ago

Just wait, as most of the stuff before #8 has been rewritten as well, so it doesn't matter. Maybe I can push it this night, but I'm 100% sure it will be out tomorrow, the only thing that's left to finish is the documentation and it shouldn't take as much as all the testing that the development of the tools involves.

Evilamblonyx commented 4 years ago

Don't rush, I'll just wail for your release and test it then.

ferferga commented 4 years ago

@Evilamblonyx All the problems with formatting (and with any kind of formatting really) should be finally addressed in TLImporter 3.0.6

Please, try it and give me your feedback. I also released TLRevert, so you can remove all the traces of the previous (incomplete) import using that tool.

Thank you very much for your patience! ☺

PS: The new file parsing logic should be also way faster at parsing really huge files, at least I was impressed by the speed compared to what I had in mind. However, since the last time I used TLImporter, I upgraded from a laptop to a much beefier PC, so maybe it's caused by that. Can you provide me some feedback on this?

Evilamblonyx commented 4 years ago

@ferferga Thank you for all your work. I'll test the new version as soon as I can. I've got some trouble exporting my chats from whatsapp at the moment, because they deactivated the export function in the german version. But as soon as I solved this problem, I'll start testing.

ferferga commented 4 years ago

@Evilamblonyx No issues here, I can export chats as usual by going to the chat > three dots > More > Export

Evilamblonyx commented 4 years ago

@ferferga That's not an issue with whatsapp. They deactivated this function on purpose. Blackberry sued Facebook at a German court, because Blackberry says they have a patent on the function to export chats. It sounds completely stupid to me, that you might get a patent for such a trivial function. But the court said Blackberry is right, and so Facebook deactivated this function for German users.
Most news on this topic are obciosly in german, but there are some english news to,: https://world-today-news.com/whatsapp-chatexport-deactivated-in-germany/

I'll need to figure out a solution for this.

ferferga commented 4 years ago

@Evilamblonyx lol :O

What if you use a VPN? Or temporarily change your number to another country and then change it back to your personal one?

Also, have you checked if you can export all the chats? Settings > Chats > History > Export all the chats.

Also, it might be worth it to go back a few version, although stuff might break.

Evilamblonyx commented 4 years ago

@Evilamblonyx I'll try out to use a VPN, but I think they identify the country you are in by your phone number. I would like to use another phone number, but I have no idea how to get a SIM cart from another country. Some people say going back a few versions worked for them. I tried it, but without success so far. Exporting all chats is deactivated as well.

Thank you for working on this problem with me.

Evilamblonyx commented 4 years ago

@ferferga Ok, I found a solution. Theres a Chrome Extension, that let you export whatsapp chats from the whatsapp webinterface. I'll start testing now.

Evilamblonyx commented 4 years ago

@ferferga Hm, it doesn't work. There is only one message importet to telegramm, that contains all messages. Imput: linebreaks.txt

Result: image

I blachened some personal informarion in the files.

If it's usefull for you, we can have a personal chat on this topic, so I can test new releases immediately.

ferferga commented 4 years ago

@Evilamblonyx Ping me on Telegram

eutampieri commented 4 years ago

This is similar to the iOS exports, also please notice that every attachment contains in the filename the 1-based number of the message (i.e. if I send a photo on the 6th message the photo filename will be prefixed with 00000006-) and @mgiacopu found out that every message containing an attachment is prefixed by <U+200E>, like this message: <U+200E>[19/10/19, 14:34:11] Eugenio Tampieri: <U+200E><allegato: 00000006-AUDIO-2019-10-19-14-34-11.opus>.

eutampieri commented 4 years ago

This format divergence could be handled by asking the user which format the messages are in

ferferga commented 4 years ago

Hello guys @eutampieri @Evilamblonyx!

Following the conversation and samples provided by @Evilamblonyx, I added a few tweaks that should make the text parsing a little bit more solid for old formats. I think it should work with mostly every file that has an standard format for text files.

Everything is on 3.0.7 release. Would you guys mind trying that version? So, in case everything goes right, I can close the issue.

eutampieri commented 4 years ago

I can't test it now, if you want I can send you a demo export

ferferga commented 4 years ago

@eutampieri Send it here as a quote.

Did you check if the format is similar to the one that @Evilamblonyx has or the one here?

eutampieri commented 4 years ago

It closely resembles @Evilamblonyx 's one. I attached it on issue #10

ferferga commented 4 years ago

@eutampieri Yes, I can confirm yours is working fine as well now. Closing.