Encrypt a large file AES-128-CBC ValueError: Padding is incorrect.

louishot commented 4 years ago

Hello, I'm try to use Encrypt a large file but try to decrypt this file got a error!

I tried to encrypt & decrypt a 11 byte txt file, it working perfectly, but tried a 50mb file get the error

    text = unpad(cryptor.decrypt(data),16)
  File "C:\Python\Python38-32\lib\site-packages\Crypto\Util\Padding.py", line 90, in unpad
    raise ValueError("Padding is incorrect.")
ValueError: Padding is incorrect.


import os
import hashlib
import binascii
from Crypto.Cipher import AES

from Crypto.Util.Padding import pad, unpad

def aes_decode(data, key):
    cryptor = AES.new(key, AES.MODE_CBC, key)
    text = unpad(cryptor.decrypt(data),16)
    return text
    #return text.rstrip(b'\0')  # .decode("utf-8")

def aes_encode(data, key):
    cryptor = AES.new(key, AES.MODE_CBC, key)
    text = cryptor.encrypt(pad(data,16))
    return text

def encrypt_by_aes(file_path,new_file_path,password):
    aes_pass = password
    aes_key = hashlib.md5(aes_pass.encode('utf-8')).hexdigest()
    print(aes_key)
    aes_key = binascii.unhexlify(aes_key)

    if os.path.exists(new_file_path):
        os.remove(new_file_path)

    file = open(file_path, 'rb')
    file_size = os.path.getsize(file_path)
    CHUNK_SIZE = 16*1024*1024

    while file.tell() < file_size:
        data = file.read(CHUNK_SIZE)
        if data:
            with open(new_file_path, 'ab') as f:
                f.write(aes_encode(data, aes_key))
        else:
            break
    print("encode %s OK!" % file_path)

def decrypt_by_aes(file_path,new_file_path,password):
    aes_pass = password
    aes_key = hashlib.md5(aes_pass.encode('utf-8')).hexdigest()
    print(aes_key)
    aes_key = binascii.unhexlify(aes_key)

    if os.path.exists(new_file_path):
        os.remove(new_file_path)

    file = open(file_path, 'rb')
    file_size = os.path.getsize(file_path)
    CHUNK_SIZE = 16*1024*1024

    while file.tell() < file_size:
        data = file.read(CHUNK_SIZE)
        if data:
            with open(new_file_path, 'ab') as f:
                f.write(aes_decode(data, aes_key))
        else:
            break
    print("encode %s OK!" % file_path)

encrypt_by_aes('LICENSE.txt','LICENSE_.txt','123')
decrypt_by_aes('LICENSE_.txt','LICENSE__.txt','123')

louishot commented 4 years ago

I have add this, to use pad only on Non-block size multiple, so the No problem with unpad function, I think should adjust pad function check the data really need pad then append pad otherwise should return the original data, not pad anything.

                if len(data) % AES.block_size != 0:
                    f.write(aes_encode(pad(data, AES.block_size), aes_key))
                else:
                    f.write(aes_encode(data, aes_key))

Test:
print(len(data))
print(len(pad(data, AES.block_size)))

output:
52428800
52428816
52428800
52428816
52428800
52428816
52428800
52428816
52428800
52428816
52428800
52428816
52428800
52428816
52428800
52428816

full code:

def pad(data_to_pad, block_size, style='pkcs7'):
    """Apply standard padding.

    Args:
      data_to_pad (byte string):
        The data that needs to be padded.
      block_size (integer):
        The block boundary to use for padding. The output length is guaranteed
        to be a multiple of :data:`block_size`.
      style (string):
        Padding algorithm. It can be *'pkcs7'* (default), *'iso7816'* or *'x923'*.

    Return:
      byte string : the original data with the appropriate padding added at the end.
    """

    if len(data_to_pad) % block_size == 0:
        return data_to_pad

    padding_len = block_size-len(data_to_pad)%block_size
    if style == 'pkcs7':
        padding = bchr(padding_len)*padding_len
    elif style == 'x923':
        padding = bchr(0)*(padding_len-1) + bchr(padding_len)
    elif style == 'iso7816':
        padding = bchr(128) + bchr(0)*(padding_len-1)
    else:
        raise ValueError("Unknown padding style")
    return data_to_pad + padding

Legrandin commented 4 years ago

At the very least, you are not considering that AES.new(key, AES.MODE_CBC, key) is creating a random IV, which you are ignoring. When you encrypt, you must also write it out, and load it back when you decrypt.

louishot commented 4 years ago

At the very least, you are not considering that AES.new(key, AES.MODE_CBC, key) is creating a random IV, which you are ignoring. When you encrypt, you must also write it out, and load it back when you decrypt.

Yes I know that, currently I am just testing development and manually specified an iv for encryption/decryption but this should not affect the unpad results

Legrandin commented 4 years ago

Yes, an incorrect IV will affect unpad results, in that the decrypted output will be random and the unpadding routines expects the padding data to have a certain structure.

Legorooj commented 4 years ago

Legrandin is right, @louishot, you need to use the same iv.

Legorooj commented 4 years ago

@louishot is this issue safe to close now? I assume so.

texadactyl commented 4 years ago

@Legrandin Another case of inconsistent IV between encryption and decryption. Time to close?

sayoojsamuel commented 4 years ago

@louishot The issue is with your padding function.

if len(data_to_pad) % block_size == 0: return data_to_pad

^^ This is a wrong implementation of pkcs7 standard. The padding function should pad the plaintext irrespective of its size already being a multiple of the block size.

Lets consider three scenarios:	pt	Before Padding	< Size	After Padding
pt1	"YellowSubmarines"	16	"YellowSubmarines"	16
pt2	"YellowSubmarin\x02\x02"	16	"YellowSubmarin\x02\x02"	16
pt3	"YellowSub"	9	"YellowSub\x07\x07\x07\x07\x07\x07\x07"	16

In case pt3, your Unpad function will work as expected. In case pt2, the Unpad function will remove the last two \x02 bytes of the pt, mistaking it for the padding byte. In case pt1, depending on your implementation could delete 115 bytes from the end.
(This is such an example: def unpad(m): return m[:-ord(m[-1])])

Since your Unpad function cannot distinguish between a padded and unpadded message, it is necessary to pad the message.

If the message is already a multiple of `block_size`, you will have to add an extra pading block as below:	pt	Before Padding	< Size	After Padding
pt1	"YellowSubmarines"	16	"YellowSubmarines\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10"	32
pt2	"YellowSubmarin\x02\x02"	16	"YellowSubmarin\x02\x02\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10"	32
pt3	"YellowSub"	9	"YellowSub\x07\x07\x07\x07\x07\x07\x07"	16

The easiest fix to the issue is to remove the following code.

if len(data_to_pad) % block_size == 0: return data_to_pad

Hope this helps!

sayoojsamuel commented 4 years ago

Oh, just noticed that the code is fixed before the pycryptodome==3.9.6 version.
Coudn't find the exact commit that fixed this issue.

Should be safe to close now.

texadactyl commented 4 years ago

@sayoojsamuel So, close it.

sayoojsamuel commented 4 years ago

@sayoojsamuel So, close it.

Sorry, I'm not a collaborator, neither did I open this issue.

@louishot or @Legrandin can wrap this up.

Legrandin / pycryptodome

Encrypt a large file AES-128-CBC ValueError: Padding is incorrect. #342