clalancette / pycdlib

Python library to read and write ISOs
GNU Lesser General Public License v2.1
147 stars 38 forks source link

Read UDF filesystems with Extended File Entries? #94

Open jbosboom opened 2 years ago

jbosboom commented 2 years ago

test.iso.zip contains an image with an empty UDF 2.01 filesystem. It also contains a runt ISO 9660 filesystem for compatibility with tools (and some hardware) that assume all UDF images will have one, but the tool I'm working on is intended to produce UDF-only images. Linux mounts both filesystems with no complaints in dmesg (with dynamic debug turned on for the udf module), but they may not be fully correct. I understand that pycdlib strictly verifies its input, so I hope to use it as a verifier/interoperability test.

I cloned the pycdlib repository and ran PYTHONPATH=/home/jbosboom/github-repos/pycdlib tools/pycdlib-explorer /tmp/test.iso. I get the following traceback:

Traceback (most recent call last):
  File "/home/jbosboom/github-repos/pycdlib/tools/pycdlib-explorer", line 539, in <module>
    main()
  File "/home/jbosboom/github-repos/pycdlib/tools/pycdlib-explorer", line 524, in main
    iso.open_fp(fp)
  File "/home/jbosboom/github-repos/pycdlib/pycdlib/pycdlib.py", line 4145, in open_fp
    self._open_fp(fp)
  File "/home/jbosboom/github-repos/pycdlib/pycdlib/pycdlib.py", line 2395, in _open_fp
    self._walk_udf_directories(extent_to_inode)
  File "/home/jbosboom/github-repos/pycdlib/pycdlib/pycdlib.py", line 2148, in _walk_udf_directories
    self.udf_root = self._parse_udf_file_entry(part_start + self.udf_file_set.root_dir_icb.log_block_num,
  File "/home/jbosboom/github-repos/pycdlib/pycdlib/pycdlib.py", line 2129, in _parse_udf_file_entry
    raise pycdlibexception.PyCdlibInvalidISO('UDF File Entry Tag identifier not 261')
pycdlib.pycdlibexception.PyCdlibInvalidISO: UDF File Entry Tag identifier not 261

The exception message is correct -- the root directory is an Extended File Entry with tag 266. Extended File Entries have been legal in UDF since 2.00; section 3.3.5 on named streams says

Extended File Entries are required for files with associated named streams. Files without named streams should use Extended File Entries.

Many tools don't use them because (normal) FEs are discussed prominently in the specs and EFEs are not (DCN-5160 is about this), but I want to use only EFEs because they provide a superset of the capabilities of FEs. Immediately, that means I can store creation times without emitting a file times extended attribute, and eventually I would like to support named streams. (My overall goal is to convert various slightly-broken NTFS filesystems into UDF filesystems. I even have plans to do this conversion "in place" using FICLONERANGE/reflinks to assemble the file data.)

Grepping around, it seems pycdlib has code and tests for EFEs. So, this is a feature request to hook them up. (If they're likely to be buggy despite being tested because they haven't been used in anger, that's less useful for testing interoperability -- but as I'm also implementing, maybe we'll shake out each other's bugs.)

(Also, apologies for being so verbose.)