clalancette / pycdlib

Python library to read and write ISOs
GNU Lesser General Public License v2.1
150 stars 38 forks source link

UDF format ISO Large file support #65

Open MarkBaggett opened 3 years ago

MarkBaggett commented 3 years ago

Hello, First, thank you Chris for this wonderful module. I am having trouble with the UDF ISO's breaking large files up into smaller ones. I imagine I am not enabling some feature to support large files or something like that but I am not sure what I am doing wrong.

I am running version 1.1

C:\Users\User\Desktop\test>pip show pycdlib
Name: pycdlib
Version: 1.11.0
Summary: Pure python ISO manipulation library
Home-page: http://github.com/clalancette/pycdlib
Author: Chris Lalancette
Author-email: clalancette@gmail.com
License: LGPLv2
Location: c:\users\user\venv\mpmod\lib\site-packages
Requires:
Required-by: media-processor

Here I have a directory with a large file in it..

 C:\Users\User\Desktop\test>dir bigzip
 Volume in drive C has no label.
 Volume Serial Number is 6000-8F6B

 Directory of C:\Users\User\Desktop\test\bigzip

04/27/2021  07:33 AM    <DIR>          .
04/27/2021  07:33 AM    <DIR>          ..
07/17/2020  08:51 AM     8,367,733,776 Windows-10vm.zip
               1 File(s)  8,367,733,776 bytes
               2 Dir(s)  45,585,305,600 bytes free

Here is my code:

def dir2iso(source, destination, filter=None):
    "Create an ISO from a given source directory."
    if filter==None:
        filter = lambda x:True
    new_iso = pycdlib.PyCdlib()
    new_iso.new(udf="2.60")
    for eachitem in pathlib.Path(source).rglob("*"):
        if eachitem.is_dir() and filter(eachitem):
            new_iso.add_directory( udf_path = "/"+str(eachitem.relative_to(source).as_posix())) 
        elif eachitem.is_file() and filter(eachitem):
            new_iso.add_file(str(eachitem), udf_path = "/"+str(eachitem.relative_to(source).as_posix()))
    new_iso.write(destination)
    return "Created", []

def dir2iso_cli():
    parser = argparse.ArgumentParser()
    parser.add_argument("source", help = "The path to the directory to turn into an ISO")
    parser.add_argument("destination", help="The destination ISO file to create (including path).")
    args = parser.parse_args()
    dir2iso(args.source, args.destination)

if __name__ == "__main__":
    dir2iso_cli()

I execute that program and pass it the directory containing the 1 large zip file and here is the resulting iso:

E:\>dir 
 Volume in drive E is CDROM
 Volume Serial Number is 5957-8578

 Directory of E:\

04/04/2021  01:02 AM     4,294,965,248 Windows-10vm.zip
04/04/2021  01:02 AM     4,072,768,528 Windows-10vm.zip
               2 File(s)  8,367,733,776 bytes
               0 Dir(s)               0 bytes free

I would appreciate your help.

clalancette commented 3 years ago

Ah, interesting.

The issue here is a limitation of ISOs. Regular ISO9660 can only create files of up to 4GB. However, it allows "splitting" files into smaller files, so you can effectively get larger file sizes.

UDF does not have the 4GB file limitation. However, pycdlib treats all ISOs as ISO9660 compatible, with optional UDF support. So it still splits up all files into smaller chunks so that they are still viable from the ISO9660 perspective.

I'm not sure how to resolve this, to be honest. We could add a "UDF-only" mode, but it's actually quite a lot of work and I've been stuck trying to do that for years now (see #19, for instance). Otherwise, in order to maintain compatibility with older ISO9660, we kind of have to keep doing this splitting.

I'm open to other ideas, but I can't think of how to fix this right now.

MarkBaggett commented 3 years ago

Interesting. Forgive my ignorance of the various iso standards and limitations. When I changed my ISO format to Joliet it stopped splitting the files. I have 1 8GB file now. Is that an expected behavior?

Mark


From: Chris Lalancette @.> Sent: Wednesday, April 28, 2021 9:49:54 PM To: clalancette/pycdlib @.> Cc: MarkBaggett @.>; Author @.> Subject: Re: [clalancette/pycdlib] UDF format ISO Large file support (#65)

Ah, interesting.

The issue here is a limitation of ISOs. Regular ISO9660 can only create files of up to 4GB. However, it allows "splitting" files into smaller files, so you can effectively get larger file sizes.

UDF does not have the 4GB file limitation. However, pycdlib treats all ISOs as ISO9660 compatible, with optional UDF support. So it still splits up all files into smaller chunks so that they are still viable from the ISO9660 perspective.

I'm not sure how to resolve this, to be honest. We could add a "UDF-only" mode, but it's actually quite a lot of work and I've been stuck trying to do that for years now (see #19https://github.com/clalancette/pycdlib/issues/19, for instance). Otherwise, in order to maintain compatibility with older ISO9660, we kind of have to keep doing this splitting.

I'm open to other ideas, but I can't think of how to fix this right now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/clalancette/pycdlib/issues/65#issuecomment-828889510, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFVSDMXYYLEW2EWGUJOXVDTLC3MFANCNFSM43VFJBQQ.

MarkBaggett commented 3 years ago

Joliet format didn't solve my issue after all. In joliet format is no longer splitting the ISO into multiple ISO files. However the files inside the ISO appear to be limited to 8GB. I tried modifying the following lines of the code above:

    new_iso = pycdlib.PyCdlib()
    #new_iso.new(udf="2.60")
    new_iso.new(joliet=3)

Now files are truncated (see below).

How do I use this library to create an ISO that contains 20GB files? Is it possible?

File lengths in ISO are truncated. File hashes don't match (for obvious reasons).

PS C:\Users\User\Desktop\source> ls *.ova

    Directory: C:\Users\User\Desktop\source
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         5/24/2021   2:35 PM    12123749376 VirtualMachine.ova

PS C:\Users\User\Desktop\source> Get-FileHash -Algorithm md5 *.ova
Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
MD5             511F9BCF8863BF4FD319212A62D95836                    .\VirtualMachine.ova                   

PS E:\> dir *.ova
    Directory: E:\
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
--r---          6/8/2021  10:27 AM     7828784128 VirtualMachine.ova

PS E:\> Get-FileHash -Algorithm md5 *.ova
Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
MD5             C3B0F7272D50C7115A7E31C206A5BC11                                       E:\VirtualMachine.ova
MarkBaggett commented 3 years ago

If anyone else finds that they need to create ISOs on Windows files larger than 8GB, here is the nasty, dirty, traitorous solution I came up with. If there is a way to do this with this or any other native python module I'd appreciate the heads up.

https://github.com/MarkBaggett/pxpowershell

Specifically the dir2iso function in https://github.com/MarkBaggett/pxpowershell/blob/main/pxpowershell/example_dir2iso.py