KnugiHK / WhatsApp-Chat-Exporter

A customizable Android and iOS/iPadOS WhatsApp database parser that will give you the history of your WhatsApp conversations in HTML and JSON. Android Backup Crypt12, Crypt14, Crypt15, and new schema supported.
https://wts.knugi.dev/
MIT License
521 stars 76 forks source link

Move/Convert media to "usual" structure #31

Closed Fvegini closed 1 year ago

Fvegini commented 1 year ago

There's some tool or script that may move or convert all the media to a normal single-folder structure?

I would like to backup my whatsapp media mainly for the images/videos of my family, but the current mixed files inside "random" folders are horrible to do this. Since all the filenames are hashed and don't seems to have any metadata related to the date the media was sent, I can't see an easy way to achieve this.

If there's nothing already created to achieve this, I think will try to create some script to read the HTML or JSON, then search for the media tags, then copy to a custom folder and rename with the "date_sent" or something like this!

KnugiHK commented 1 year ago

I assume you are talking about the backup of iOS. Correct me if not. Apple renames the files to the hash of the path intentionally. Possibly to force their users to stay in their ecosystem. AFAIK there is nothing to archive that.

Fvegini commented 1 year ago

Exactly, it is in the iOS and the crazy hash folder!

I created a very simple script that worked like a charm to me!

The code is really, really ugly... but it works and is simple to understand and execute!

If someone want to do something similar it is here:


import json
import pytz
from datetime import datetime, timedelta
from pathlib import Path
import shutil
import os
from slugify import slugify

custom_timezone = pytz.timezone('America/Sao_Paulo')

with open("result.json", "r") as f:
    data = json.load(f)

slugified_name = None
folder_path = None
put_data_in_front_of_contact_name = False

try:
    for _, contact in data.items():
        if put_data_in_front_of_contact_name and slugified_name and os.path.exists(folder_path):
            main_path = Path(folder_path)
            new_folder_path = Path(os.listdir(folder_path)[-1]).stem + "_" + main_path.stem
            new_main_path = Path(os.path.join(main_path.parents[0], new_folder_path))
            os.rename(main_path, new_main_path)
        name = contact.get("name")
        if not name:
            slugified_name = "sem_nome"
        else:
            slugified_name = slugify(name)
        for _, item in contact.get("messages").items():
            if item.get("media"):
                try:
                    media_time = datetime.fromtimestamp(item.get("timestamp")).astimezone(custom_timezone) - timedelta(hours=2)
                    original_path = Path(os.path.join("result", item.get("data")))

                    if original_path.suffix in ([".vcf"]):
                        continue
                    new_filename = media_time.strftime("%Y_%m_%d-%H_%M_%S") + original_path.suffix
                    final_path = os.path.join("new", slugified_name, new_filename)
                    if os.path.exists(final_path):
                        continue
                    folder_path = os.path.dirname(final_path)
                    os.makedirs(folder_path, exist_ok=True)
                    shutil.copyfile(original_path, final_path)
                except Exception as ex:
                    print("error")
except Exception as ex:
    print("error")
Fvegini commented 1 year ago

It basically will copy all the files of all the contacts to a new structure using the contact name and the time that the media was sent, creating a structure like this:

new/Contact1/2023_02_01-11_10_10.jpg new/Contact1/2023_02_02-16_15_20.jpg new/Contact1/2023_02_03-17_10_10.jpg

I also added a flag to rename the contact folder with the date of the last media sent, this way you can sort and maybe just delete all the very old ones. (It just use the last media data, not the last message data)

KnugiHK commented 1 year ago

Glad to see the initiative for producing JSON actually helps someone!