clalancette / pycdlib

Python library to read and write ISOs
GNU Lesser General Public License v2.1
143 stars 38 forks source link

Can't read Blu-Ray UDF ISO #19

Open NL9 opened 5 years ago

NL9 commented 5 years ago

Looks like pycdlib doesn't support UDF using on Blu-Ray ISO.

I'm using:

import sys
import pycdlib

iso = pycdlib.PyCdlib()
iso.open(sys.argv[1])

for child in iso.list_children(iso_path='/'):
    print(child.file_identifier())

iso.close()

And got this error:

python3 test_iso.py Reign.of.the.Supermen.2019.BluRay.1080p.AVC.DTS-HD.MA5.1-MTeam.iso
Traceback (most recent call last):
  File "test_iso.py", line 5, in <module>
    iso.open(sys.argv[1])
  File "---/python/python3/lib/python3.6/site-packages/pycdlib/pycdlib.py", line 4041, in open
    self._open_fp(fp)
  File "---/python/python3/lib/python3.6/site-packages/pycdlib/pycdlib.py", line 2397, in _open_fp
    self._parse_volume_descriptors()
  File "--/python/python3/lib/python3.6/site-packages/pycdlib/pycdlib.py", line 914, in _parse_volume_descriptors
    raise pycdlibexception.PyCdlibInvalidISO('Valid ISO9660 filesystems must have at least one PVD')
pycdlib.pycdlibexception.PyCdlibInvalidISO: Valid ISO9660 filesystems must have at least one PVD

I'm using pycdlib-1.7.0

clalancette commented 5 years ago

Right, so the problem is that pycdlib only understands the UDF "Bridge" format, which basically combines an old-style ISO9660 structure with a new-style UDF structure.

What you are trying is a UDF-only ISO, which I don't actually have any of at the moment. In some sense I've intentionally not implemented UDF-only support, since it kind of seems like it belongs in a "pyudf" package or something like that. On the other hand, I have large swaths of UDF implemented here to handle the bridge format, so it is sort of a natural extension here. I'd have to think about this some more.

@NL9 Do you know of any convenient Linux programs to generate UDF-only ISOs? mkisofs/genisoimage only generate bridge format ones.

NL9 commented 5 years ago

I think it's a good idea to support all ISO filesystems in one package.

Try mkudffs https://manpages.ubuntu.com/manpages/bionic/man8/mkudffs.8.html

clalancette commented 5 years ago

Try mkudffs https://manpages.ubuntu.com/manpages/bionic/man8/mkudffs.8.html

Fantastic, thanks for that. I'll take a look at it.

This may take me some time to get done; my free time at the moment is kind of limited. I'll work on it in the background, but if you have any time to contribute, patches would definitely be welcome. The most important first step is to add a way to do automated testing for UDF filesystems; with that in place, we can confidently make changes to the library until things work.

clinton-hall commented 5 years ago

@clalancette This is something I would be keen to be able to use in nzbToMedia. https://github.com/clinton-hall/nzbToMedia/issues/1588

I have an example bluray iso image (UDF 2.50) and am happy to provide data and test etc.

Basically, I have tried isoparser, I have tried pycdlib (and received the same output as the OP here), and I have tried 7zip which likewise doesn't support UDF 2.50/2.60.

The only method I have currently is to mount the .iso, but this required root/sudo which is not a preferred practice for an automation script.

clalancette commented 5 years ago

Yeah, I've come around to agreeing that this library should include that support. I've been slowly working on it, but it is a rather large expansion of the UDF support, so it will take me some time to complete. Thanks for the comments, I'll let you know when I have something to show.

clinton-hall commented 5 years ago

thanks @clalancette Appreciate your work on the project. As said, I am happy to test and provide examples etc. once you get to that phase.

Landcross commented 5 years ago

Any update on the UDF support?

clalancette commented 5 years ago

Any update on the UDF support?

I'm still working on it on this branch: https://github.com/clalancette/pycdlib/tree/udf-only . I've gotten it so that it opens more UDF disks without exceptions, but not all of the ones I have access to. It also doesn't successfully write out those UDF disks, so there is still work to be done before I can merge it. Sorry it's been slow, I haven't had a lot of time for it and it is fairly complex. Any help on that branch in the form of pull requests is welcome.

Landcross commented 5 years ago

No worries, there's no hurry (though, I would be lying if I'd say I wasn't looking forward to udf support in this package haha). I wish I could help, but my knowledge about filesystems is next to nothing.

I have this blu-ray that crashes when loading into pycdlib with the error: "Partition map goes beyond end of data, ISO corrupt" at line 2179 in udf.py. When I log the offset, map_len and len(partition_maps[offset:]), they are respectively 6, 64 and 66. It's the second and last partition_map in the forloop, the first partition map goes through the whole loop succesfully.

I'm 100% certain this iso is not corrupt though.

F4n4t commented 4 years ago

Thanks @clalancette im glad i found your Project. I get the same result on Blu-ray isos like @Landcross did. I don't know if this could help, but there is a guy who modified the libdvdread library for BD5 ISO and Blu-ray support. http://www.lundman.net/wiki/index.php/Libdvdread.plus Edit: This c lib from vlc is even newer https://code.videolan.org/videolan/libudfread/-/tree/master

clalancette commented 3 years ago

It's been a very long time since this issue was opened, so just as a quick update:

I did do a lot of work to increase the UDF support in pycdlib. It should now support a lot more of the UDF structures, and has unit tests for most of that work.

Unfortunately, it turns out that pycdlib has a deep-seated reliance on the existence of a Primary Volume Descriptor (PVD) on the ISO. For those that don't know, the PVD is sort of the anchor point that lets one know that the series of bytes you are looking is indeed formatted like a PVD. A UDF-bridge disk has a PVD, but a pure UDF (like what is on Bluray disks) does not.

I poked at this for some time, and there is no easy way to remove that reliance on a PVD from pycdlib. It is probably possible with quite a bit of work, but that's sort of where I've been stuck for the 2 years since this bug was opened. At this point, I can't promise any sort of timeframe for this feature. If you are interested in looking at what I've done and/or helping out, a good starting place are the two branches of work where I've tried to make this work: