TelegramTools / TLImporter

📲 Telegram Chat Importer: Import chats from WhatsApp or other services into Telegram
GNU Affero General Public License v3.0
126 stars 11 forks source link

Allow importing media to chat #10

Open fabiosangregorio opened 4 years ago

fabiosangregorio commented 4 years ago

It would be awesome if I could send all the media from a chat too. I'd be happy to discuss the feasibility of the implementation with you and putting together a PR for this, after the refactoring issue (#5) is merged.

ferferga commented 4 years ago

Hey, good afternoon! First of all, thank you very much for your interest in improving TLImporter! 😁

Thanks for reminding me about that PR that, in fact, should be closed. The original author didn't commit any other changes and didn't show any sign of activity. In the meanwhile, I added some fixes and merged #8, so I would need to backport those changes to the decomposition of the program.

Aside from that, although I know that my codebase is pretty dirty and should have a better style, I prefer having it this way if it's not going to receive any further message and interest from the people who made those changes, as I already know where each line is located and I'm fully aware of where it's doing something and why. Using object-oriented programming with classes might be easier as well to understand for some people, but in small projects like this, I am more accostumed and I'm faster writing and debugging without objects, as well as I don't really see the necessity of using classes.

I would like first, before doing any further changes, to update the Telethon version used by TLImporter, as I'm currently using the old sync version due to some mandatary changes in the way the Telegram's auth key must be handled by asynchronous processing in Telethon. The new method for reimporting the key makes it not really suitable for its use in the Secret Mode, and, back in the day, I talked with Lonami about ways to simplify and bring back the old behaviour, and, afaik, he did some changes, but didn't want to take all the time again of retesting everything, so I left it as it was. The old version is currently having multiple issues when importing inside 'Saved Messages', so it seems that it's the time to move to the newer version, even if I don't want to.

That's what I have in my roadmap, but I'm not sure when I will be able to do it, as I will be starting exams and practices really soon. But, after that, I think that we can discuss the implementation of media importing. If you do want to help me as well in migrating to the new Telethon version, your help would be greatly appreciated!

Btw, do you have any idea, both convenient/easy for users on how to bring media importing to life? WhatsApp for example doesn't give any info about the media contained in an exported chat, you must decrypt and open the database first if you want to locate the path to each photo/video. That's why I'm interesting in knowing what's on your mind!

Thank you very much again for your interest!

fabiosangregorio commented 4 years ago

Hi, sorry for the delay. I'm starting exams aswell so I don't have much time as of now. :)

I analyzed an export from whatsapp containing some files. Here's the export:

04/09/18, 13:21 - Bea: text text text
04/09/18, 13:22 - Bea: ‎IMG-20180904-WA0002.jpg (file allegato)
text text text
04/09/18, 13:27 - Fabio: text text text
14/12/18, 10:54 - Fabio: ‎PTT-20181214-WA0003.opus (file allegato)

A file is distinguished by "(file allegato)", which in italian means "attached file". Next to that is the file name.

For importing files to the Telegram chat we could ask the user to save the Whatsapp folder to his computer and provide us with the path. We could then match the filename in the txt export with the filename, and the extension in the txt with the folder, and send it to the telegram chat. Could this work?

ferferga commented 4 years ago

@fabiosangregorio I didn't reply you back in the day when you posted this, but I checked it and you're right, media paths are specified in Whatsapp exports. I only exported chats where I had all the media removed from my storage, so it appeared in the txt as <media ommitted>. So your idea seems feasible.

I finished adapting TLImporter to the new telethon versions (and Telegram layers), new secret mode is now live and all the work I wanted to do with the file parsing logic is done in TLImporter 3.0.6.

I won't release any new updates (if not new major bugs appear), as the app is pretty mature now and it covers even more that what I needed when I decided to make it. So, you have plenty of time to make a PR if you want to to add this feature, you probably won't need to deal with merge conflicts :).

I don't need to import media (and I probably won't in the future), so I don't really have the motivation to make it real on my own, but any PR willing to improve the tools is welcome, so if you want to make it, I encourage you to do so and I thank you very much for investing your time :). It's nice to see that people likes your work and want to improve it, so thanks!

Some difficulties that come up to my mind that you might need to solve for getting this done:

I think that using path.isfile with a wide variety of regexes and a correct string splitting might make this easier and also avoid false positives.

eutampieri commented 4 years ago

Hey! I was working on a WhatsApp export parser too! @fabiosangregorio are you Italian? If so, my fields vary a little from yours. Anyway, if you want some help let me know

fabiosangregorio commented 4 years ago

Sorry, as always too many projects and too little time. I won't be able to put together a PR for a long time. University is time consuming :) Yes, the export language is indeed italian.

eutampieri commented 4 years ago

@fabiosangregorio is your export from WhatsApp? Mine (from Android and iOS) are formatted differently… They’re formatted like this: [timestamp] sender: message With message being either the text or the <attachment(allegato): filename> for attachments

fabiosangregorio commented 4 years ago

Exporting a chat via Chat > Menu > Other > Export chat > Share via... > Email results in two files being sent via email:

Chat txt file

The chat file is formatted as

DD/MM/YY, HH:mm - SenderName SenderSurname: [attachment.ext (attached file)] \n Chat text or file caption

with [] being optional.

Chat WhatsApp con Lucia RedactedSurname.txt 17/03/18, 18:54 - I messaggi inviati a questa chat e le chiamate sono ora protetti con la crittografia end-to-end. Tocca per maggiori info. 17/03/18, 18:54 - Lucia RedactedSurname: RedactedText 08/05/18, 18:27 - Fabio: RedactedText 08/05/18, 18:43 - Lucia RedactedSurname: RedactedText 👍🏼🤪 08/05/18, 22:09 - Fabio: RedactedText 08/05/18, 22:09 - Fabio: RedactedText 08/05/18, 22:10 - Lucia RedactedSurname: RedactedText 08/05/18, 22:38 - Fabio: RedactedText 22/11/18, 08:27 - Lucia RedactedSurname: ‎IMG-20181122-WA0000.jpg (file allegato) RedactedText 👋🏻👋🏻 // This is the caption to the previous image 22/11/18, 08:35 - Fabio: RedactedText 24/12/18, 08:58 - Lucia RedactedSurname: RedactedText 24/12/18, 09:55 - Fabio: RedactedText 😋 27/10/19, 09:53 - Lucia RedactedSurname: RedactedText 27/10/19, 09:55 - Fabio: RedactedText 01/11/19, 19:37 - Lucia RedactedSurname: RedactedText 01/11/19, 19:45 - Fabio: RedactedText

Image file

IMG-20181122-WA0000.jpg

eutampieri commented 4 years ago

_chat 2.txt Here's mine

eutampieri commented 4 years ago

@ferferga can I implement an alternative parser which gets activated if the program finds that messages are in the format above? Because otherwise the software imports messages as an unique looong message. Maybe I should open another issue instead to keep track of this...

ferferga commented 4 years ago

@eutampieri I think this is the same problem as the one reported in #11. I'm looking into it.

eutampieri commented 4 years ago

Yes, it’s the same issue but it doesn’t crash

eutampieri commented 4 years ago

I'm starting to have @fabiosangregorio's problem…. I don't have an ETA yet

Sohyperbolic commented 3 years ago

Yes it will be good to get the voice files and photos imported too.

diedais commented 3 years ago

Hi there is another problem with timestamps, while I can fix it by changing the language and exporting again, I want to know if you are aware of this.

image

image

My export language is Spanish.

diedais commented 3 years ago

Ok, 2 hours ago they just released an update with official support to import chats (including multimedia) XD

aflorenzan commented 3 years ago

Ok, 2 hours ago they just released an update with official support to import chats (including multimedia) XD

That's correct, but due to whatsapp restrictions/limitations you can't export your whole chat history but only the last 40,000 messages (without media).

When exporting with media, you can send up to 10,000 latest messages. Without media, you can send 40,000 messages. These constraints are due to maximum email sizes. Source: https://faq.whatsapp.com/android/chats/how-to-save-your-chat-history/?lang=en

ferferga commented 3 years ago

@Dislekzi4 It's not a language poblem, it's because they replaced : with - again. The app in fact was coded around Spanish format.

Check here to see the appropiate format that TLImporter will always support so you can adapt it yourself https://github.com/TelegramTools/TLImporter/blob/python/samples/WhatsApp_Chat_Diego_Vel%C3%A1zquez.txt

As a side note, this question doesn''t belong to this issue as it's completely unrelated to the original topic, it should've been in a new separate issue.

muety commented 3 years ago

Hi all, will this tool eventually support importing media files, using Telegram's official API?