Closed Frankprog03 closed 1 year ago
try running with --tar-only
, and extract manually
@YuvrajRaghuvanshiS The error occurs regardless of the flag's usage. As @Frankprog03 posted, if you use the flag you end up with this error on Windows:
[Friday 05/05/2023, 11:43:27] >>> I am in view_extract.extract_ab(is_java_installed=True, is_tar_only=False)
[Friday 05/05/2023, 11:43:27] Found "whatsapp.ab" in "tmp" folder. Continuing... Size: 146647497 bytes.
[Friday 05/05/2023, 11:43:27] Enter a name for this user (default "user").:
[Friday 05/05/2023, 11:43:27] Enter same password which you entered on device when prompted earlier.: ********
[Friday 05/05/2023, 11:43:27] Successfully unpacked "tmp/whatsapp.ab" to "tmp/whatsapp.tar". Size: 327728644 bytes.
[Friday 05/05/2023, 11:43:27] >>> I am in view_extract.taking_out_main_files(username=user)
[Friday 05/05/2023, 11:43:27] Folder "extracted/" already exists.
[Friday 05/05/2023, 11:43:27] Taking out main files in "tmp/" folder temporarily.
[Friday 05/05/2023, 11:43:27] unexpected end of data
[Friday 05/05/2023, 11:43:27] >>> I am in view_extract.clean_tmp()
[Friday 05/05/2023, 11:43:27] Cleaning up "tmp/" folder...
[Friday 05/05/2023, 11:43:27] [WinError 32] The process cannot access the file because it is being used by another process: 'tmp/whatsapp.tar'
[Friday 05/05/2023, 11:43:27] >>> I am in view_extract.kill_me(reason=)
However, if you do not use the flag, and then manually extract using tar -xvf user.tar
, you get the following:
(...)
apps/com.whatsapp/db/media.db
apps/com.whatsapp/db/stickers.db-shm
apps/com.whatsapp/db/stickers.db-wal
apps/com.whatsapp/db/payments.db-shm
apps/com.whatsapp/db/payments.db-wal
apps/com.whatsapp/db/location.db-shm
apps/com.whatsapp/db/location.db-wal
apps/com.whatsapp/db/location.db
apps/com.whatsapp/db/msgstore.db-shm
apps/com.whatsapp/db/msgstore.db-wal
apps/com.whatsapp/db/msgstore.db
tar: Unexpected EOF in archive
tar: rmtlseek not stopped at a record boundary
tar: Error is not recoverable: exiting now
I tried running this on Windows 11 and WSL2. The tar fails to extract on both OS.
System information:
Java: OpenJDK 1.8.0_362 (Java 8)
{
"Architecture": "x86_64",
"Hostname": "AN515-45",
"Platform": "Linux",
"Platform Release": "5.15.90.1-microsoft-standard-WSL2",
"Platform Version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023",
"Processor": "x86_64",
"RAM": "7 GB",
"Python": [
"main",
"Mar 10 2023 10:55:28"
]
}
@aditeyabaral the problem seems to come out when extracting msgstore.db
, causing it to be corrupted (Unfortunately this is the protagonist file here...). Also, the key doesn't exist in the archive. The extracted .ab file is approximately 5GB in my case, but the resulting tar is only 1.5GB. I'm not sure if this is normal as I have no experience with android backups. My guess is that for some reason the conversion from ab to tar is going wrong, so maybe the problem is with abe.jar... (?)
I also tried with WSL2, getting the same.
@Frankprog03 The msgstore.db
file is already decrypted, so there is no need for a key actually. The issue is, as you said, the file getting corrupted while converting from .ab
to .tar
. We will have to wait for @YuvrajRaghuvanshiS to give us a better idea of how to debug this issue.
@aditeyabaral yeah, I know. The key file could have been useful if it had been extracted before the EOF, because then I can easily copy and decrypt msgstore.db.crypt14
, which can be read normally.
@Frankprog03 can the script be modified to copy the key file first to the system? Then we can decrypt any of the files later as well. Unfortunately I do not have much idea about backups so I am not sure how accurately this would work.
I am not entirely sure what is causing it or what exactly is this, EOF is too vague.
Yes, it may be that case that this is because of abe.jar but I am not entirely sure. This project is too little maintained and can be considered dead, I don't find enough time to continue working on it (this started out of boredom of lockdown)
I really hope that abe.jar is causing it because while planning to update this repo I was trying to remove any external depencies as a result of which I put together a script (mostly copied :p) which would replace abe.jar #93 .
You can try this with the 'ab' you have extracted.
Create .ab without password so that this can be used (just tested, it works):
import tarfile
import zlib
import io
with open('D:\\Yuvraj\\Work\\GitHub\\WA-KDBE\\extracted\\crashed\\whatsapp.ab', 'rb') as f:
f.seek(24) # skip 24 bytes (headers)
data = f.read() # read the rest
tarstream = zlib.decompress(data)
with open('D:\\Yuvraj\\Work\\GitHub\\WA-KDBE\\extracted\\crashed\\whatsapp.tar', 'wb') as f:
f.write(tarstream)
This is another one which is more extensive.
requirements:
black==23.1.0
certifi==2022.12.7
charset-normalizer==3.1.0
click==8.1.3
colorama==0.4.6
idna==3.4
mypy-extensions==1.0.0
packaging==23.0
pathspec==0.11.1
platformdirs==3.1.1
psutil==5.9.4
pycryptodome==3.17
requests==2.28.2
tqdm==4.65.0
urllib3==1.26.15
import codecs
import ctypes
import zlib
from binascii import hexlify, unhexlify
from struct import pack
from Crypto.Cipher import AES
from Crypto.Protocol.KDF import PBKDF2
class AndroidBackupExtractor:
CHUNK_SIZE: int = 128 * 1024
def __init__(self, ab_file_path: str, password: str = "") -> None:
out_file_path = f"{ab_file_path.split('.ab')[0]}.tar"
self.ab_file = open(ab_file_path, "rb")
self.out_file = open(out_file_path, "wb")
self.password = password.encode("utf-8")
def read_header(self, ab_file) -> None:
self.header = dict()
self.header["version"] = ab_file.readline()[:-1]
self.header["compression"] = ab_file.readline()[:-1]
self.header["encryption"] = ab_file.readline()[:-1]
if self.header["encryption"] == b"none":
pass
elif self.header["encryption"] == b"AES-256":
# get PBKDF2 parameters to decrypt master key blob
self.header["user_password_salt"] = unhexlify(ab_file.readline()[:-1])
self.header["master_key_checksum_salt"] = unhexlify(ab_file.readline()[:-1])
self.header["round"] = int(ab_file.readline()[:-1])
self.header["user_key_iv"] = unhexlify(ab_file.readline()[:-1])
self.header["master_key_blob"] = unhexlify(ab_file.readline()[:-1])
print("user password salt:", hexlify(self.header["user_password_salt"]))
print(
"master key checksum salt:",
hexlify(self.header["master_key_checksum_salt"]),
)
print("number of PBKDF2 rounds:", self.header["round"])
print("user key IV:", hexlify(self.header["user_key_iv"]))
print("master key blob:", hexlify(self.header["master_key_blob"]))
else:
raise RuntimeError(
f"Unsupported encryption scheme: {self.header['encryption']}"
)
def decrypt(self, encrypted_iter, aes_obj):
for encrypted in encrypted_iter:
yield aes_obj.decrypt(encrypted)
def chunk_reader(self, ab_file, chunk_size=CHUNK_SIZE):
data = ab_file.read(chunk_size)
while data:
yield data
data = ab_file.read(chunk_size)
def master_key_java_conversion(self, master_key_bytes_array):
"""
because of byte to Java char before using password data as PBKDF2 key, special handling is required
from : https://android.googlesource.com/platform/frameworks/base/+/master/services/backup/java/com/android/server/backup/BackupManagerService.java
private byte[] makeKeyChecksum(byte[] pwBytes, byte[] salt, int rounds) {
char[] mkAsChar = new char[pwBytes.length];
for (int i = 0; i < pwBytes.length; i++) {
mkAsChar[i] = (char) pwBytes[i]; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HERE
}
Key checksum = buildCharArrayKey(mkAsChar, salt, rounds);
return checksum.getEncoded();
}
Java byte to char conversion (as "Widening and Narrowing Primitive Conversion") is defined here:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.1.4
First, the byte is converted to an int via widening primitive conversion (chapter 5.1.2),
and then the resulting int is converted to a char by narrowing primitive conversion (chapter 5.1.3)
"""
# Widening Primitive Conversion : https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.1.2
to_signed: list[int] = [
ctypes.c_byte(x).value for x in master_key_bytes_array
] # sign extension
# Narrowing Primitive Conversion : https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.1.3
to_unsigned_16_bits: list[int] = [
ctypes.c_ushort(x).value & 0xFFFF for x in to_signed
]
"""
The Java programming language represents text in sequences of 16-bit code UNITS, using the UTF-16 encoding.
https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.1
"""
to_bytes: bytes = pack(
f">{len(to_unsigned_16_bits)}H", *to_unsigned_16_bits
) # unsigned short to bytes
to_utf_16_be: str = codecs.decode(to_bytes, "UTF-16BE") # from bytes to Utf16
"""
https://developer.android.com/reference/javax/crypto/spec/PBEKeySpec.html
\"Different PBE mechanisms may consume different bits of each password character.
For example, the PBE mechanism defined in PKCS #5 looks at only the low order 8 bits of each character,
whereas PKCS #12 looks at all 16 bits of each character. \"
"""
to_utf_8: bytes = codecs.encode(
to_utf_16_be, "UTF-8"
) # char must be encoded as UTF-8 first
return to_utf_8
def get_AES_decrypter(self, password):
assert (
self.header["encryption"] == b"AES-256"
), f"Not using AES decryption: {self.header['encryption']}"
# generate AES key from password and salt
key: bytes = PBKDF2(
password, self.header["user_password_salt"], 32, self.header["round"]
) # default algo is sha1
decrypted_master_key_blob: bytes = AES.new(
key, AES.MODE_CBC, self.header["user_key_iv"]
).decrypt(self.header["master_key_blob"])
# parse decrypted blob
iv_len: int = decrypted_master_key_blob[0]
iv: bytes = decrypted_master_key_blob[1 : 1 + iv_len]
master_key_len: int = ord(
decrypted_master_key_blob[1 + iv_len : 1 + iv_len + 1]
)
master_key: bytes = decrypted_master_key_blob[
1 + iv_len + 1 : 1 + iv_len + 1 + master_key_len
]
checksum_len: int = ord(
decrypted_master_key_blob[
1 + iv_len + 1 + master_key_len : 1 + iv_len + 1 + master_key_len + 1
]
)
checksum: bytes = decrypted_master_key_blob[
1
+ iv_len
+ 1
+ master_key_len
+ 1 : 1
+ iv_len
+ 1
+ master_key_len
+ 1
+ checksum_len
]
print("IV length:", iv_len)
print("IV:", hexlify(iv))
print("master key length:", master_key_len)
print("master key:", hexlify(master_key))
print("check value length:", checksum_len)
print("check value:", hexlify(checksum))
# verify password
to_bytes_2: bytes = self.master_key_java_conversion(
bytearray(master_key)
) # consider data as bytes, not str
print("PBKDF2 secret value for password verification is: ", end="")
print(hexlify(to_bytes_2))
calculated_checksum: bytes = PBKDF2(
to_bytes_2,
self.header["master_key_checksum_salt"],
checksum_len,
self.header["round"],
)
if calculated_checksum != checksum:
print(
"computed checksum:",
hexlify(calculated_checksum),
"is different than embedded checksum:",
hexlify(checksum),
)
else:
print("password verification is OK")
# decryption using master key and iv
return AES.new(master_key, AES.MODE_CBC, iv)
def decompress(self, compressed_data_iter, block_size=CHUNK_SIZE):
decompress_obj = zlib.decompressobj()
for compressed_data in compressed_data_iter:
yield decompress_obj.decompress(compressed_data)
yield decompress_obj.flush()
if not decompress_obj.eof:
raise RuntimeError("Incomplete or truncated zlib stream")
def ab_to_tar(self) -> bool:
if self.ab_file.readline()[:-1] != b"ANDROID BACKUP":
raise ValueError('Magic is not "ANDROID BACKUP"')
# parse header
self.read_header(self.ab_file)
if self.header["encryption"] == b"AES-256":
if not self.password:
self.password = input("Backup is encrypted, enter password: ").encode(
"utf-8"
)
compressed_iter = self.decrypt(
self.chunk_reader(self.ab_file), self.get_AES_decrypter(self.password)
)
elif self.header["encryption"] == b"none":
print("No encryption")
compressed_iter = self.chunk_reader(self.ab_file)
else:
raise ValueError("Unknown encryption")
# decompression (zlib stream)
print("Writing backup as .tar... ", end="", flush=True)
for decompressed_data in self.decompress(compressed_iter):
self.out_file.write(decompressed_data)
print(
f"Done. Filename is '{self.out_file.name}', {self.out_file.tell()} bytes written."
)
return True
abe = AndroidBackupExtractor("tmp/whatsapp.ab", "password")
res: bool = abe.ab_to_tar()
change
def clean_tmp():
custom_print('>>> I am in view_extract.clean_tmp()', is_print=False)
if(os.path.isdir(tmp)):
custom_print(f'Cleaning up \"{tmp}\" folder...', 'yellow')
shutil.rmtree(tmp)
to
def clean_tmp():
pass
in view_extract so that it avoids cleaning the ab file
@YuvrajRaghuvanshiS wow! I will try it as soon as I can. Thanks.
@YuvrajRaghuvanshiS Ok, you solved my problem but it is weirder than expected. I changed the clean_tmp()
function to
def clean_tmp():
pass
and I executed the script to convert the ab with the script you provided. But the script ran with no problems at all. It seems clean_tmp()
somehow conflicted with the conversion but I don't know how. I don't care anymore, though :)
Now I successfully got all the expected files in the extracted folder. Thanks.
It seems it tried to clean to the tmp before it actually finished the unpacking (ab -> tar) and doing that it snatched the ab file from abe.jar, hence the size difference. It works I also don't know why. Anyways I am happy it all worked out for you.
@Frankprog03 Can you plz make a fork of this repo with modified code, so that we non-programmers can also benefit from it.
Thankyou.
@UsamaAshfaq there it is https://github.com/Frankprog03/WhatsApp-Key-Database-Extractor/tree/master I think this is not and shouldn't be a permanent fix, just a "patch".
Everything works fine until it comes to the final extraction of the tar archive. It is probably corrupted and I have no idea if it is caused by abe.jar or something else. I tried backing up at least 10 times trying with and without the backup password, but I get every time the same result. Extraction of the tar with third party software such as 7zip yields the same.
This is the log of the last stage of the script:
My device: Honor View 10 (BKL-L09) OS: Windows 10