passwords and glob ffmpeg in win

dobrosketchkun commented 4 years ago

Implemented a simple password encoding and zip compression. Also, ffmpeg "glob" pattern doesn't work in Windows, so I tried to make a workaround. Plus, I incorporated https://github.com/AlfredoSequeida/fvid/pull/14/ by https://github.com/Theelgirl into this.

Theelx commented 4 years ago

That's super interesting! Did you make any benchmarks for speed?

dobrosketchkun commented 4 years ago

Unfortunatly I didn't, but when I treid original fdiv, and with your modifications, the latter was few seconds faster on file around 1.5 mb (pdf)

Theelx commented 4 years ago

Also, when I tested it, it said "unrecognized argument: test" when I passed -p "test". I see that passing -p alone makes it ask you for the password. I suggest that you make it let you pass the password in the command line by removing the action parameter, and if you do that, then it doesn't ask for your input.

Edit: A short help description may be nice too even if not needed, for consistency since all the other arguments have a help description.

dobrosketchkun commented 4 years ago

Do you mean just type your password as plain text? It's not very secure.

A short help description may be nice

I forgot about it, thnx

Theelx commented 4 years ago

Yeah I meant type it as plain text. If it's encrypted by the program, what's the matter? Sorry if this is dumb, I'm not very versed in password security as I've never made programs that have user accounts and/or passwords before.

Edit: There's a stack overflow question about how to use getpass with argparse in the command line, I personally like this answer: https://stackoverflow.com/a/44416389

dobrosketchkun commented 4 years ago

It's not dumb, just not paranoid enough for my taste. You see, you are right; it's on your PC, but there are cashe of commands and someone can just see what you are typing an screen and that kind of stuff.

Theelx commented 4 years ago

Yeah that's fair. My bash_history file is hidden in Ubuntu 20.04, but I can still see it and technically grab the plaintext password from there. However, the vast majority of users will probably be on Windows, and I don't believe there's an equivalent of bash_history for Windows.

Theelx commented 4 years ago

Here are the lines I changed to make it use getpass on the plaintext flags. I took out the functions that I didn't change. If you'd like, I can add this to my optimization commit.

class Password:
    DEFAULT = 'False'

    def __init__(self, value):
        if value == self.DEFAULT:
            value = getpass.getpass('Enter Password (press enter to skip): ')
        self.value = value

    def __str__(self):
        return self.value

    def __bool__(self):
        return True

def get_password(pwd=False):
    password_provided = pwd
    if not pwd:
        password_provided = getpass.getpass("Enter password:")
    password = str(password_provided).encode()  
    salt = os.urandom(32)
    kdf = PBKDF2HMAC(
        algorithm=hashes.SHA512(),
        length=32,
        salt=salt,
        iterations=100000,
        backend=default_backend()
        )
    key = base64.urlsafe_b64encode(kdf.derive(password)) 
    return key

def main():
    parser = argparse.ArgumentParser(description="save files as videos", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        "-e", "--encode", help="encode file as video", action="store_true"
    )
    parser.add_argument(
        "-d", "--decode", help="decode file from video", action="store_true"
    )

    parser.add_argument("-i", "--input", help="input file", required=True)
    parser.add_argument("-o", "--output", help="output path")
    parser.add_argument("-f", "--framerate", help="set framerate for encoding (as a fraction)", default="1/5", type=str)
    parser.add_argument("-p", "--password", help="set password", nargs="?", type=Password, default=Password.DEFAULT)
    args = parser.parse_args()

    setup()

    if args.decode:
        if args.password != "":
            key = get_password(args.password)
        bits = get_bits_from_video(args.input)

        file_path = None

        if args.output:
            file_path = args.output

        if args.password:
            save_bits_to_file_crypto(file_path, bits, key)
        else:
            save_bits_to_file(file_path, bits)

    elif args.encode:
        # isdigit has the benefit of being True and raising an error if the user passes a negative string
        # all() lets us check if both the negative sign and forward slash are in the string, to prevent negative fractions
        if (not args.framerate.isdigit() and "/" not in args.framerate) or all(x in args.framerate for x in ("-", "/")):
            raise NotImplementedError("The framerate must be a positive fraction or an integer for now, like 3, '1/3', or '1/5'!")
        # get bits from file
        if args.password != "":
            key = get_password(args.password)
            bits = get_bits_from_file_crypto(args.input, key)
        else:
            bits = get_bits_from_file(args.input)

        # create image sequence
        image_sequence = make_image_sequence(bits)

        # save images
        for index in range(len(image_sequence)):
            image_sequence[index].save(
                f"{FRAMES_DIR}encoded_frames_{index}.png"
            )

        video_file_path = None

        if args.output:
            video_file_path = args.output

        make_video(video_file_path, image_sequence, args.framerate)

    cleanup()

dobrosketchkun commented 4 years ago

It looks like a nice compromise!

If you'd like, I can add this to my optimization commit.

It'll be nice as long as it'll not be confusing for AlfredoSequeida and if it is tested with full pasword code modifications.

Theelx commented 4 years ago

I tested it with ascii and utf-8 passwords, with an empty password flag, and with no password flag. They all work, as long as the encoding and decoding passwords are the same. However, this version requires a password every time, to I added "Press Enter to skip", which basically makes the password the enter key.

Theelx commented 4 years ago

I'm getting this when I try to decrypt videos generated with a password, do you know what could cause it? Are you intending to gzip files by default? gzip.BadGzipFile: Not a gzipped file (b'\xff\xd8')

~~Also, when I enter a password for decryption and didn't enter one for encryption, it gives me a file back? Shouldn't it raise an error?~~ Never mind, it's because I made another modification that I forgot about to prevent the gzip error by sending the file to the normal save_bits_from_file instead of the crypto version when possible.

dobrosketchkun commented 4 years ago

Here another modification I want to add in order to get rid of Magic module, which is kind of not user friendly on win:

import pickle

def get_bits_from_file_crypto(filepath, key):
    bitarray = BitArray(filename=filepath)
    bitarray.append(DELIMITER)
    message = pickle.dumps({'filename': filepath, 'data' : str(bitarray.bin)}) # <--------------------
    # message = str(bitarray.bin).encode()
    f = Fernet(key)
    encrypted = f.encrypt(message)
    #zip
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode='w') as fo:
        fo.write(encrypted)
    encrypted_zip = out.getvalue()
    #zip

    bitarray2 = BitArray(encrypted_zip)
    print('Bits are in place')
    return bitarray2.bin

# <....>

def save_bits_to_file_crypto(file_path, bits, key):
    bitstring_temp = Bits(bin=bits)
    encrypted = bitstring_temp.tobytes()

    #zip
    in_ = io.BytesIO()
    in_.write(encrypted)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        encrypted = fo.read()
    #zip

    f = Fernet(key)
    decrypted_bits = f.decrypt(encrypted)#.decode()
    _dict = pickle.loads(decrypted_bits) # <------------------------------------------------------------
    filename = _dict['filename']
    decrypted_bits_with_tail = _dict['data']

    bitstring_with_tail = Bits(bin=decrypted_bits_with_tail)
    bitstring_with_tail = bitstring_with_tail.bin
    # print('decoded_bitstring', bitstring_with_tail)
    delimiter_str = DELIMITER.replace("0b", "")
    delimiter_length = len(delimiter_str)

    if bitstring_with_tail[-delimiter_length:] == delimiter_str:
        bitstring_with_tail = bitstring_with_tail[: len(bitstring_with_tail) - delimiter_length]

    bitstring = Bits(bin=bitstring_with_tail)

    # mime = Magic(mime=True)
    # mime_type = mime.from_buffer(bitstring.tobytes())

    if file_path == None:
        filepath = filename
    else:
        filepath = file_path

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)

#<...>

Theelx commented 4 years ago

Why are you using gzip? Compression will lose us bits, right? I had to add a modification to save_bits_to_file_crypto to make it raise an error on invalid passwords, but now it raises an error when the correct password is entered also, so I want to avoid that:

class WrongPassword(Exception):
    pass

def save_bits_to_file_crypto(file_path, bits, key):
    bitstring_temp = Bits(bin=bits)
    encrypted = bitstring_temp.tobytes()

    #zip
    in_ = io.BytesIO()
    in_.write(encrypted)
    in_.seek(0)
    if file_path is None:
        bitstring = Bits(bin=bits)
        mime = Magic(mime=True)
        mime_type = mime.from_buffer(bitstring.tobytes())
        file_path = f"file{mimetypes.guess_extension(type=mime_type)}"
    with open(file_path, 'rb') as fo:
        encrypted = fo.read()
    #zip

    f = Fernet(key)
    try:
        decrypted_bits = f.decrypt(encrypted).decode()
    except cryptography.Fernet.InvalidToken:
        raise WrongPassword("That's not the password used to encrypt the file!")
    bitstring_with_tail = Bits(bin=decrypted_bits)
    bitstring_with_tail = bitstring_with_tail.bin
    # print('decoded_bitstring', bitstring_with_tail)
    delimiter_str = DELIMITER.replace("0b", "")
    delimiter_length = len(delimiter_str)

    if bitstring_with_tail[-delimiter_length:] == delimiter_str:
        bitstring_with_tail = bitstring_with_tail[: len(bitstring_with_tail) - delimiter_length]

    bitstring = Bits(bin=bitstring_with_tail)

    mime = Magic(mime=True)
    mime_type = mime.from_buffer(bitstring.tobytes())

    if file_path == None:
        filepath = f"file{mimetypes.guess_extension(type=mime_type)}"
    else:
        filepath = file_path

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)

Theelx commented 4 years ago

Ah, something is up with generating the key. An example key generated with encoding the password "test" is b'2v99r7msWq2ZsLM27WS_LxVmzd5rfzmOKiMcbKgA_z4=' and the key that it tries to get from the image is b'MxhcM4dL6HLmJSLRnMWyFJWqShIq6gsqT6u3wNNyuj0='

Edit: It's the urandom salt in get_password. Making the salt static makes the keys the same. However, InvalidToken is still raised.

dobrosketchkun commented 4 years ago

Why are you using gzip?

I use gzip since the encoding algorithm transforms a 1.5 mb file into 60 mb video without gzip and into 45 mb video with it and bigger the video longer the decoding.

Theelx commented 4 years ago

Yes, but is gzip lossless? Will we lose any bits by using gzip?

dobrosketchkun commented 4 years ago

Well, it should be:

https://www.gzip.org/ https://zlib.net/

Theelx commented 4 years ago

Ok, well either way, when I try to decode it using your code in the most recent comment, I get this error:


  File "./fvid.py", line 353, in main
    save_bits_to_file_crypto(file_path, bits, key)
  File "./fvid.py", line 211, in save_bits_to_file_crypto
    encrypted = fo.read()
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\xff\xd8')```

dobrosketchkun commented 4 years ago

Well, since your implementation of the password dialogue is better than mine, but it doesn't seem compatible with compression/decompression, I guess we just get rid of gzip for now. The question is where it's better to be, in my push or yours?

Theelx commented 4 years ago

Both of our creations are bugged with gzip, and I have no clue how to debug the gzip as I only understand part of what you did. What I'd do if I were you would be to figure out the error, fix it, and patch the solution to this branch (I can't, because I don't fully understand the crypto stuff you did).

So, to answer your question, since it's not working for either of us, keep it in this push until you can fix it.

dobrosketchkun commented 4 years ago

Commented gzip out for now and the magic module too. Also, I did some minor tweaks with non-crypto variants of functions.

Theelx commented 4 years ago

Getting an InvalidToken error with any password now.

File "./fvid.py", line 360, in main save_bits_to_file_crypto(file_path, bits, key) File "./fvid.py", line 221, in save_bits_to_file_crypto decrypted_bits = f.decrypt(encrypted)#.decode() File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/cryptography/fernet.py", line 74, in decrypt timestamp, data = Fernet._get_unverified_token_data(token) File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/cryptography/fernet.py", line 92, in _get_unverified_token_data raise InvalidToken cryptography.fernet.InvalidToken

Also, instead of separate functions for crypto and non-crypto versions, how about adding a boolean crypto argument to the non-crypto version and removing the crypto? It could result in a lot less code if done right.

Theelx commented 4 years ago

The modifications you made to the non-crypto files resulted in a lot of ffmpeg dianostics cluttering the screen and a file size 10x bigger. Do you know what changes could be causing this?

AlfredoSequeida commented 4 years ago

@dobrosketchkun This is awesome! I just read through the conversation. Let me digest this and I will get back to you!

dobrosketchkun commented 4 years ago

First of all, I don't really know well pull section of Github, so I may pushed a button, I don't need to push, lol.

Anyway, I rewrote all the code to better clarity.

I checked gzip losslessness by using this code:

import gzip
import io
import os
import hashlib
from tqdm import tqdm

same = 0
diff = 0
size = 100000
times = 1000000

for _ in tqdm(range(times)):
    random_message = os.urandom(size)
    hash_orig = hashlib.sha256()
    hash_orig.update(random_message)
    hash_orig = hash_orig.hexdigest()

    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode='w') as fo:
        fo.write(random_message)
    random_zip = out.getvalue()

    in_ = io.BytesIO()
    in_.write(random_zip)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        random_unzip = fo.read()

    hash_unzip = hashlib.sha256()
    hash_unzip.update(random_unzip)
    hash_unzip = hash_unzip.hexdigest()

    if hash_orig == hash_unzip:
        same += 1
    elif hash_orig != hash_unzip:
        diff += 1
    else:
        print('WTF', hash_orig, hash_unzip)

print({'times' : times,
        'size' : size,
        'same' : same,
        'diff' : diff
})

result:

>py gzip_test.py
100%|█████████████████████████████████████████████████████████████████████| 1000000/1000000 [1:57:47<00:00, 141.49it/s]
{'times': 1000000, 'size': 100000, 'same': 1000000, 'diff': 0}

So, I'm pretty sure it's safe to say that python gzip is lossless.

Here changes in functions I made to the original variant of the code:

make_video() - "glob" pattern doesn't work in Win, made a workaround get_password() from Theelgirl - salt needed to be the same in one instance of coding/decoding get_bits_from_image() - with my modifications doesn't require to find DELIMITER save_bits_to_file() - new way to find a DELIMITER after decrypting save_bits_to_file() - filenames without funky mime magic

So, basically, the filename is contained in the pickle structure, along with encrypted data and cryptographic tag. If you don't use "-p" flag you are really using a password anyway, default one.

I checked this code on Windows 10 (python 3.7.4) and Arch (python 3.8.6)

Theelx commented 4 years ago

I still get this when decoding mp4s containing an encoded Lenna without using the -p flag on Ubuntu 20.04, Python 3.8.3:

  File "./fvid.py", line 331, in main
    save_bits_to_file(file_path, bits, key)
  File "./fvid.py", line 179, in save_bits_to_file
    bitstring = fo.read()
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'LG')

In addition, it takes upwards of 55 seconds to read the Lenna mp4 with ffmpeg, which is a serious performance regression, even worse than it was in the original version without any optimizations. It takes 30 seconds to read an encoded mp4 of one of my jpg files, where it took under a second without gzip/cryptography, so there's a huge overhead in ffmpeg processing the gzip files.

~~Edit: It seems to have been erroring and taking a long time because the fvid_frames directory was still using frames from Lenna's test, it works well for me now.~~

Edit 2: I don't know how I got it to work previously. I just did it again on Lenna, and it put 300 files taking up 600MB of disk in the fvid_frames folder before running the decoder and giving me the traceback pasted earlier. Here are some diagnostics: Screenshot (553) Screenshot (552)

AlfredoSequeida commented 4 years ago

@dobrosketchkun I like the idea of compressing the data assuming we can get the original data back, of course, it makes a lot of fo sense. As soon as I have some time I will test your changes. Thank you!

Theelx commented 4 years ago

I can vouch for them working on Ubuntu and Windows if that helps

dobrosketchkun commented 4 years ago

@AlfredoSequeida btw, in a theoretical situation, if you want to keep only one thing of all of this, it need to be password encryption, because one cannot just put their stuff in public places without protection.

dobrosketchkun commented 4 years ago

@Theelgirl @AlfredoSequeida I think I found a way to reduce the time from file encoding drastically. Let's say we are talking about this file - https://archive.org/download/LowEndCo1985/LowEndCo1985_64kb.mp4 because Lenna.png is too small to see the difference. So, my current code with some optimizations upon original encodes it in 1h 5min on my Win10 machine, but with a new approach, it only takes 3min 13s! (decoding in both variants is around 17 mins)

How it's possible Main bottleneck in original approach is this:

bit_sequence = split_list_by_n(list(map(int, bitstring)), width * height)

You need to bite every string 1 or 0 and make it int. It takes ages, put PIL understands only bytes. I was thinking about it and suddenly remembered about the existence of a text-based image format - ppm p3 version. Example of a small file from the spec:

P3
# feep.ppm
4 4
15
 0  0  0    0  0  0    0  0  0   15  0 15
 0  0  0    0 15  7    0  0  0    0  0  0
 0  0  0    0  0  0    0 15  7    0  0  0
15  0 15    0  0  0    0  0  0    0  0  0

To make a p3 ppm file, you need a magic phrase - "P3", then w and h on new lin, and on another line with maximum color value (arbitrary up to 65536 and bigger than 0). After that, you need to put lines of pixels in R G B R G B R G B ... format, not very sofisticated.

Exactly that new make_image_sequence() is doing.

The only con is it's quite pricy on temporary files volume - this file's takes around 1.5 Gb (300 for png variant)

PS I also figured out (thanks to @zavok that with gzip, you don't really need a delimiter; gzip cut in by itself.

The code:

from bitstring import Bits, BitArray
from PIL import Image
import glob

from operator import sub
import numpy as np
from tqdm import tqdm

import binascii

import argparse
import sys
import os

import getpass 

import io
import gzip
import pickle

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from Crypto.Cipher import AES

from time import time

#DELIMITER = bin(int.from_bytes("HELLO MY NAME IS ALFREDO".encode(), "big"))
FRAMES_DIR = "./fvid_frames/"
SALT = '63929291bca3c602de64352a4d4bfe69'.encode()  # It need be the same in one instance of coding/decoding
DEFAULT_KEY = ' '*32
DEFAULT_KEY = DEFAULT_KEY.encode()
NOTDEBUG = True

class WrongPassword(Exception):
    pass

class MissingArgument(Exception):
    pass

def get_password(password_provided):
    if password_provided=='default':
        return DEFAULT_KEY
    else:
        if password_provided == None:
            password_provided = getpass.getpass("Enter password:")

        password = str(password_provided).encode()  
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA512(),
            length=32,
            salt=SALT,
            iterations=100000,
            backend=default_backend()
            )
        key = kdf.derive(password)
        return key

def get_bits_from_file(filepath, key):
    print('Reading file...')
    bitarray = BitArray(filename=filepath)
    # adding a delimiter to know when the file ends to avoid corrupted files
    # when retrieving
    # bitarray.append(DELIMITER)

    cipher = AES.new(key, AES.MODE_EAX, nonce=SALT)
    ciphertext, tag = cipher.encrypt_and_digest(bitarray.tobytes())

    filename = os.path.basename(filepath)
    pickled = pickle.dumps({'tag':tag,
                            'data':ciphertext,
                            'filename':filepath})
    print('Ziping...')
    #zip
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode='w') as fo:
        fo.write(pickled)
    zip = out.getvalue()
    #zip

    del bitarray
    del pickled

    bitarray = BitArray(zip)
    return bitarray.bin

def less(val1, val2):
    return val1 < val2

def get_bits_from_image(image):
    width, height = image.size

    done = False

    px = image.load()
    bits = ""

    pbar = tqdm(range(height), desc="Getting bits from frame")

    white = (255, 255, 255)
    black = (0, 0, 0)

    for y in pbar:
        for x in range(width):

            pixel = px[x, y]

            pixel_bin_rep = "0"

            # for exact matches
            if pixel == white:
                pixel_bin_rep = "1"
            elif pixel == black:
                pixel_bin_rep = "0"
            else:
                white_diff = tuple(map(abs, map(sub, white, pixel)))
                # min_diff = white_diff
                black_diff = tuple(map(abs, map(sub, black, pixel)))

                # if the white difference is smaller, that means the pixel is closer
                # to white, otherwise, the pixel must be black
                if all(map(less, white_diff, black_diff)):
                    pixel_bin_rep = "1"
                else:
                    pixel_bin_rep = "0"

            # adding bits
            bits += pixel_bin_rep

    return (bits, done)

def get_bits_from_video(video_filepath):
    # get image sequence from video
    print('Reading video...')
    image_sequence = []

    os.system('ffmpeg -i ' + video_filepath + ' ./fvid_frames/decoded_frames_%d.png');

    # for filename in glob.glob(f"{FRAMES_DIR}decoded_frames*.png"):
    for filename in sorted(glob.glob(f"{FRAMES_DIR}decoded_frames*.png"), key=os.path.getmtime) :
        image_sequence.append(Image.open(filename))

    bits = ""
    sequence_length = len(image_sequence)
    print('Bits are in place')
    for index in tqdm(range(sequence_length)):
        b, done = get_bits_from_image(image_sequence[index])

        bits += b

        if done:
            break

    return bits

def save_bits_to_file(file_path, bits, key):
    # get file extension

    bitstring = Bits(bin=bits)

    #zip
    print('Unziping...')
    in_ = io.BytesIO()
    in_.write(bitstring.bytes)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        bitstring = fo.read()
    #zip

    unpickled = pickle.loads(bitstring)
    tag = unpickled['tag']
    ciphertext = unpickled['data']
    filename = unpickled['filename']

    cipher = AES.new(key, AES.MODE_EAX, nonce=SALT)
    bitstring = cipher.decrypt(ciphertext)
    print('Checking integrity...')
    try:
     cipher.verify(tag)
     # print("The message is authentic")
    except ValueError:
     raise WrongPassword("Key incorrect or message corrupted")

    bitstring = BitArray(bitstring)

    # _tD = Bits(bin=DELIMITER) # New way to find a DELIMITER
    # _tD = _tD.tobytes()
    # _temp = list(bitstring.split(delimiter=_tD))
    # bitstring = _temp[0]

    # If filepath not passed in use defualt
    #    otherwise used passed in filepath
    if file_path == None:
        filepath = filename
    else:
        filepath = file_path # No need for mime Magic

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)

def split_list_by_n(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i : i + n]

def pix(bin):
    if bin == '1':
        return '255'
    else:
        return '0'

def make_image_sequence(bitstring, resolution=(1920, 1080)):
    width, height = resolution
    maxval = 1
    # split bits into sets of width*height to make (1) image
    set_size = width * height

    # bit_sequence = []
    print('Making image sequence')
    print('Cutting...')
    bitlist = list(tqdm(split_list_by_n(bitstring, set_size)))

    del bitstring

    bitlist[-1] = bitlist[-1] + '0'*(set_size - len(bitlist[-1]))

    bitlist = bitlist[::-1]
    ppm_header = f'P3 \n{width} {height} \n{maxval}\n'

    print('Saving frames...')
    for index in tqdm(range(len(bitlist))):
        bitl = bitlist.pop()
        # print('bitl', bitl)
        bitl = list(split_list_by_n(bitl, width))
        bitl = [' '.join([' '.join([_]*3) for _ in list(row)]) for row in bitl]
        image = ppm_header + '\n'.join(bitl)
        path = f"{FRAMES_DIR}encoded_frames_{index+1}.ppm"
        with open(path, 'w') as f:
            f.write(image)

def make_video(output_filepath, framerate="1/5"):

    if output_filepath == None:
        outputfile = "file.mp4"
    else:
        outputfile = output_filepath

    os.system('ffmpeg -r ' + framerate + ' -i ./fvid_frames/encoded_frames_%d.ppm -c:v libx264rgb ' + outputfile)

def cleanup():
    # remove frames
    import shutil

    shutil.rmtree(FRAMES_DIR)

def setup():
    import os

    if not os.path.exists(FRAMES_DIR):
        os.makedirs(FRAMES_DIR)

def main():
    parser = argparse.ArgumentParser(description="save files as videos")
    parser.add_argument(
        "-e", "--encode", help="encode file as video", action="store_true"
    )
    parser.add_argument(
        "-d", "--decode", help="decode file from video", action="store_true"
    )

    parser.add_argument("-i", "--input", help="input file", required=True)
    parser.add_argument("-o", "--output", help="output path")
    parser.add_argument("-f", "--framerate", help="set framerate for encoding (as a fraction)", default="1/5", type=str)
    parser.add_argument("-p", "--password", help="set password", nargs="?", type=str, default='default')

    args = parser.parse_args()

    setup()
    # print(args)
    # print('PASSWORD', args.password, [len(args.password) if len(args.password) is not None else None for _ in range(0)])

    if not args.decode and not args.encode:
        raise   MissingArgument('You should use either --encode or --decode!') #check for arguments

    key = get_password(args.password)

    if args.decode:
        bits = get_bits_from_video(args.input)

        file_path = None

        if args.output:
            file_path = args.output

        save_bits_to_file(file_path, bits, key)

    elif args.encode:
        # isdigit has the benefit of being True and raising an error if the user passes a negative string
        # all() lets us check if both the negative sign and forward slash are in the string, to prevent negative fractions
        if (not args.framerate.isdigit() and "/" not in args.framerate) or all(x in args.framerate for x in ("-", "/")):
            raise NotImplementedError("The framerate must be a positive fraction or an integer for now, like 3, '1/3', or '1/5'!")
        # get bits from file
        bits = get_bits_from_file(args.input, key)

        # create image sequence
        make_image_sequence(bits)

        # save images
        # for index in range(len(image_sequence)):
            # image_sequence[index].save(
                # f"{FRAMES_DIR}encoded_frames_{index}.png"
            # )

        video_file_path = None

        if args.output:
            video_file_path = args.output

        make_video(video_file_path, args.framerate)

    # cleanup()

time1 = time()
main()
print('Time: ', time() - time1)

AlfredoSequeida commented 4 years ago

@dobrosketchkun right now I am going through the PR requests, has that decoding part been fixed yet? I would love to test it. Also I agree - for public platforms, password encryption is a must.

dobrosketchkun commented 4 years ago

@AlfredoSequeida try the last variant, it's, as I said, super-fast with bigger files.

Theelx commented 4 years ago

@dobrosketchkun I tested with a 1.2mb jpg on Ubuntu 20.04, as the file you suggested to test wouldn't load on my computer. It takes 8 seconds to encode with the crypto program, and 6 seconds with your version. However, your version takes up 60MB in fvid_frames, while the crypto one takes 1.3MB. While in this test yours is slightly faster, it uses 50x more disk. Because I figured hey, this file is only 1.2MB so maybe your program works best on larger files, I tested a 26MB jpg (can't upload because images above 10MB aren't allowed, http://eoimages.gsfc.nasa.gov/images/imagerecords/73000/73751/world.topo.bathy.200407.3x21600x10800.jpg). Your program took 147 seconds to run, and took up 1.3GB of disk, while the crypto version took 258 seconds to run, and 27MB of disk. But here's where it gets interesting. By removing TQDM, the progress bar, the crypto version actually takes only 76.8 seconds to run, compared to 102 seconds for your version, and both use the same amount of memory.

In conclusion, according to my tests, the big speed bottleneck in large image processing is not the algorithm used, it's the progress bar. By removing the progress bar, your crypto version is actually significantly faster than, and uses less memory than, the PPM P3 on large images.

example

Theelx commented 4 years ago

Side note, to keep some sort of progress bar showing: By removing all the tqdm calls excepting this line in make_image_sequence:

    for _ in tqdm(range(len(bitlist))):

I was able to speed up the crypto version's encoding by 3x, and the PPM P3 version was only sped up by about 50%. This made the crypto version nearly twice as fast, and using under 2% of the disk space, as the PPM P3 version on my Ubuntu 20.04 system on a single core of my 4-core 3.8GHz Ryzen processor.

dobrosketchkun commented 4 years ago

Whoa~ Indeed it's the case, go figure. So I reverted to the previous variant, with new modifications about delimiter and tqdm.

@Theelgirl to finalize it, please, combine this encoding approach with yours, I assume, Cythonic decoding approach, and we will get a very fast thing.

Theelx commented 4 years ago

@dobrosketchkun Sounds good, I'll submit a new PR with both our approaches. Edit: #21

dobrosketchkun commented 4 years ago

@Theelgirl So tl;dr my approach includes:

password encryption - you can don't use it (well, basically you're using a default one in this case), use by "-p your_password" or only with "-p" to enter with getpass()
zipping (side effect - you don't need delimiter)
pickling (filename, tag for checks and zipped data)
absence of magic, mime (filename extracted from a pickled dictionary or args)
new logic of make_image_sequence() - to help with bigger files and memory issues
absence of ffmpeg because it's unnecessary and because pattern_type="glob" is not supported on Win.

Maybe I forget about something, but it's most of it.

Theelx commented 4 years ago

Got it, I removed python-magic and ffmpeg from required imports in setup.py to adjust for that.

dobrosketchkun commented 4 years ago

We combined our code with @Theelgirl in one pull https://github.com/AlfredoSequeida/fvid/pull/21

AlfredoSequeida / fvid

passwords and glob ffmpeg in win #15