Suberbia / html_to_txt

converts an html telegram chat export to a txt whatsapp chat export
3 stars 1 forks source link

Code new features #1

Open lilium895 opened 9 months ago

lilium895 commented 9 months ago

Hi, I've made some changes and improvements (I think?) to your code. Would like to test it out? It's one of my repositories. I tried to add conversion for medias, photos, videos, files. I would really appreciate if you could evaluate my work. Thanks!

Suberbia commented 9 months ago

Hi,

I hope this message finds you well. I wanted to express my sincere appreciation for the improvements you made to the code I worked on during high school. It's truly heartening to see your interest and commitment to enhancing the functionality.

After carefully reviewing the changes you implemented, I noticed a few key advantages in your modified version:

  1. Conciseness and Readability: Your code is more concise and streamlined, making it easier to read and understand. The simplifications in conditional checks and message formatting contribute to a more elegant and compact solution.

  2. Explicit Handling of Missing Sender Information: The inclusion of explicit checks for missing sender information adds a layer of robustness to the code. Skipping messages without sender details ensures a more predictable behavior, especially in scenarios where the HTML structure might vary.

  3. Simplified Text Extraction: The direct extraction of text content without an explicit check for the element's presence streamlines the code further. This assumes that every message has a text element, contributing to the overall simplicity of the script.

  4. Media Attachment Format: After delving into the part of the code that handles media, I noticed that, as per the format of WhatsApp export files, the inline attachment of a media file should look like <attached: media.jpg>. To align with this format, I recommend replacing the media format function with the following:

    
    # Format media message in WhatsApp format
    if time_str:
       whatsapp_message = f'[{date_str}, {time_str}] {sender}: '
       if media is not None:
           whatsapp_message += f'<attached: {media}>\n'
       whatsapp_chat += whatsapp_message

Your enhancements have certainly brought a fresh perspective to the code, and I'm grateful for your valuable contributions. Once again, thank you for your efforts in improving the code.

Best regards, SUBERBIA

lilium895 commented 9 months ago

Thank you very much for your feedback! I'm sorry I'm not a native English speaker and I could have misunderstood something. Apart of the media attachment issue do you think we should change or add something more? I've worked on a short chat and I think It is very probable that the code will encounter some exceptions that I haven't still seen. If you have encountered some exceptions that I didn't take in consideration, will you be so kind to list some?

In particular in point 2. do you think should I add the skip feature? From your code I removed continue in line 19 because many messages haven't the sender informations. By doing this the last message should take the sender information from the message above, but the time is changed. Do you think is it ok or should I change it? Have you seen some particular situation in which the code will give error because of those lines?

I have tried to upload it on telegram. I zipped the txt file with the media files then load it in the phone memory and at the end share it to telegram in the designated chat. But it doesn't work. It just shares the zip file with the contact. Have you done the same procedure?