clalancette / pycdlib

Python library to read and write ISOs
GNU Lesser General Public License v2.1
143 stars 38 forks source link

Modifying file in place corrupts directory records #122

Open Goshin opened 8 months ago

Goshin commented 8 months ago

The bug was recorded back in https://github.com/clalancette/pycdlib/pull/71#issuecomment-879556612. After some debugging, I figured out the cause.

When modifying the file in place, the library will try to update the file length in its directory record:

https://github.com/clalancette/pycdlib/blob/bc5187ac44405f99c37aeab6580a8d8ddfeb83a0/pycdlib/pycdlib.py#L4501-L4528

The absolute offset of the directory record is calculated based on the parent's extent location and the offset within the parent's directory records:

https://github.com/clalancette/pycdlib/blob/bc5187ac44405f99c37aeab6580a8d8ddfeb83a0/pycdlib/pycdlib.py#L4520-L4522

Here, the record.offset_to_here value is calculated by summing up previous siblings' lengths up to itself: https://github.com/clalancette/pycdlib/blob/bc5187ac44405f99c37aeab6580a8d8ddfeb83a0/pycdlib/dr.py#L703-L735

However, Pycdlib maintains the children list in sorted order when walking directory records and adding children:

https://github.com/clalancette/pycdlib/blob/bc5187ac44405f99c37aeab6580a8d8ddfeb83a0/pycdlib/dr.py#L760-L775

In this case, the index of a directory record will likely differ from its actual index in the ISO file. Therefore, the directory record offset will be wrong when writing the length back to the ISO file, and some random sibling records will be corrupted.


The solution I can come up with is to rewrite the parent's directory records together. As a temporary fix, one can simply invoke self._write_directory_records() to override all the directory records.

Goshin commented 8 months ago

A fix by rewriting all the directory records of the parent.

diff --git a/pycdlib/pycdlib.py b/pycdlib/pycdlib.py
index d9aa951..f868de2 100644
--- a/pycdlib/pycdlib.py
+++ b/pycdlib/pycdlib.py
@@ -4517,15 +4517,21 @@ class PyCdlib(object):
                     self.joliet_vd.add_to_space_size(length)
                 if record.parent is None:
                     raise pycdlibexception.PyCdlibInternalError('Modifying file with empty parent')
-                abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
-                offset = record.offset_to_here - record.dr_len
-                abs_offset = abs_extent_loc * self.logical_block_size + offset
+                record.set_data_length(length)
+                abs_extent_loc = record.parent.extent_location()
+                offset = 0
+                for sibling in record.parent.children:
+                    if (offset + sibling.dr_len) > self.logical_block_size:
+                        abs_extent_loc += 1
+                        offset = 0
+                    self._cdfp.seek(abs_extent_loc * self.logical_block_size + offset)
+                    self._cdfp.write(sibling.record())
+                    offset += sibling.dr_len
             elif isinstance(record, udfmod.UDFFileEntry):
                 abs_offset = record.extent_location() * self.logical_block_size
+                record.set_data_length(length)
+                self._cdfp.seek(abs_offset)
+                self._cdfp.write(record.record())
-
-            record.set_data_length(length)
-            self._cdfp.seek(abs_offset)
-            self._cdfp.write(record.record())

     def add_hard_link(self, **kwargs):
         # type: (str) -> None
Goshin commented 8 months ago

Just found another related bug in DirectoryRecord.record(). Some ISO files use fixed-length padding for directory records. The current implementation returns record byte arrays shorter than expected. Here is a fix padding the output to self.dr_len.

diff --git a/pycdlib/dr.py b/pycdlib/dr.py
index cdb78d9..cdc421e 100644
--- a/pycdlib/dr.py
+++ b/pycdlib/dr.py
@@ -1097,7 +1097,7 @@ class DirectoryRecord(object):
                                self.seqnum, utils.swab_16bit(self.seqnum),
                                self.len_fi) + self.file_ident + padstr + xa_rec + rr_rec]

-        outlist.append(b'\x00' * (len(outlist[0]) % 2))
+        outlist.append(b'\x00' * (self.dr_len - len(outlist[0])))

         return b''.join(outlist)
Goshin commented 8 months ago

Another fix is to walk the parent and find the record we want to update. This might be better because we can make minimal changes to the ISO file. I am not sure how to handle duplicate names in this case though.

diff --git a/pycdlib/pycdlib.py b/pycdlib/pycdlib.py
index d9aa951..6261289 100644
--- a/pycdlib/pycdlib.py
+++ b/pycdlib/pycdlib.py
@@ -4517,15 +4517,30 @@ class PyCdlib(object):
                     self.joliet_vd.add_to_space_size(length)
                 if record.parent is None:
                     raise pycdlibexception.PyCdlibInternalError('Modifying file with empty parent')
-                abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
-                offset = record.offset_to_here - record.dr_len
-                abs_offset = abs_extent_loc * self.logical_block_size + offset
+                record.set_data_length(length)
+                self._cdfp.seek(record.parent.extent_location() * self.logical_block_size)
+                data = self._cdfp.read(record.parent.get_data_length())
+                offset = 0
+                while offset < record.parent.get_data_length():
+                    lenbyte = bytearray([data[offset]])[0]
+                    if lenbyte == 0:
+                        padsize = self.logical_block_size - (offset % self.logical_block_size)
+                        if data[offset:offset + padsize] != b'\x00' * padsize:
+                            raise pycdlibexception.PyCdlibInvalidISO('Invalid padding on ISO')
+                        offset = offset + padsize
+                        continue
+                    sibling = dr.DirectoryRecord()
+                    sibling.parse(self.pvd, data[offset:offset + lenbyte], record.parent)
+                    if sibling.file_ident == record.file_ident:
+                        self._cdfp.seek(record.parent.extent_location() * self.logical_block_size + offset)
+                        self._cdfp.write(record.record())
+                        break
+                    offset += lenbyte
             elif isinstance(record, udfmod.UDFFileEntry):
                 abs_offset = record.extent_location() * self.logical_block_size
+                record.set_data_length(length)
+                self._cdfp.seek(abs_offset)
+                self._cdfp.write(record.record())
-
-            record.set_data_length(length)
-            self._cdfp.seek(abs_offset)
-            self._cdfp.write(record.record())