boredzo / impluse-hfs

A tool for converting HFS (Mac OS Standard) volumes to HFS+ (Mac OS Extended) format.
BSD 3-Clause "New" or "Revised" License
43 stars 1 forks source link

Can't extract the Guided Tour of Macintosh Plus #45

Open boredzo opened 5 months ago

boredzo commented 5 months ago

Using an uncompressed copy of the “Guided Tour of Macintosh Plus”* image from “Phil and Dave's Excellent CD” volume 1:

0%: Finding HFS volume
Rehydrating descendant 📄 “DeskTop”
Rehydrating descendant 📄 “File”
Rehydrating descendant 📄 “Letter to Mom”
Rehydrating descendant 📄 “Mousing Around”
Rehydrating descendant 📄 “Notes 8:24”
Rehydrating descendant 📁 “System Folder”
Rehydrating descendant 📄 “
Failure in rehydrating descendant 📄 “
Failure in rehydrating descendant 📄 “System Folder”
Failed to rehydrate file named Guided Tour: Error Domain=NSCocoaErrorDomain Code=514 "Couldn't look up parent for destination path (null); check for typoes" UserInfo={NSLocalizedDescription=Couldn't look up parent for destination path (null); check for typoes}
2024-01-27 13:09:09.300 impluse-hfs[38582:33948550] Failed: Couldn't look up parent for destination path (null); check for typoes

The filename shown as empty above is this file as reported by list:

  📄 Desktop 1^A        1,652   0       1,652

^A being less's escaping of the \x01 byte at the end of that filename.

I'm unsure whether this means the catalog is corrupt, impluse is misparsing it, or impluse is doing something else wrong.

The (null)s are also concerning. It could be that NSString is refusing to decode this filename.

boredzo commented 5 months ago

Here's nodeName from the item's catalog key:

[0] u_int8_t    '\x01'
[1] u_int8_t    'S'
[2] u_int8_t    't'
[3] u_int8_t    'a'
[4] u_int8_t    'r'
[5] u_int8_t    't'
[6] u_int8_t    'u'
[7] u_int8_t    'p'
[8] u_int8_t    ' '
[9] u_int8_t    'D'
[10]    u_int8_t    'e'
[11]    u_int8_t    's'
[12]    u_int8_t    'k'
[13]    u_int8_t    't'
[14]    u_int8_t    'o'
[15]    u_int8_t    'p'
[16]    u_int8_t    '\x02'
[17]    u_int8_t    '\0'
[18]    u_int8_t    '\0'
[19]    u_int8_t    '\0'
[20]    u_int8_t    'L'
[21]    u_int8_t    'M'
[22]    u_int8_t    'O'
[23]    u_int8_t    'P'
[24]    u_int8_t    'J'
[25]    u_int8_t    'S'
[26]    u_int8_t    'H'
[27]    u_int8_t    'L'
[28]    u_int8_t    '\0'
[29]    u_int8_t    '\0'
[30]    u_int8_t    '\0'
[31]    u_int8_t    '\0'

The folder's contents are supposed to include "Desktop\x01", so that \x01 at the start is intriguing, but it could be a red herring. That first byte is supposed to be the length, but there's no way this is a one-byte-long length. That's followed by "Startup Desktop", then what appear to be four four-byte values: 0x02000000, 'LMOP', 'JSHL', and 0. 'LMOP' is the file type of the "Desktop 1\x01" file; 'JSHL' is the creator code of a number of files in the System Folder on this disk.

The catalog key's length is 23; that leaves 23 - 6 = 17 bytes for the filename, including the length byte. "Startup Desktop" is 15 characters; "Startup Desktop\x02" is 16 characters. Curiously, no such filename shows up in the catalog recounting given by analyze.

In fact, looking at that output, some of the other items are truncated similarly:

- 📄 “Letter to Mom
    Parent ID: #17 (0x11)
- 📄 “Notes 8/24
    Parent ID: #17 (0x11)
- 📄 “OVWT”, ID #36 (0x24), type 'APPL' creator 'MMTE', script code default
    Parent ID: #17 (0x11)
- 📄 “Scrap Scene”, ID #37 (0x25), type 'VWSC' creator 'MMVW', script code default
    Parent ID: #17 (0x11)
- 📄 “Scrapbook File”, ID #21 (0x15), type 'ZSYS' creator 'MACS', script code default
    Parent ID: #17 (0x11)
- 📄 “Scrapbook File
    Parent ID: #17 (0x11)

I was dubious about "OVWT" as an actual filename on the disk, but it does show up in the Finder when mounting the disk in System 6.0.8.

Seems like one thing that needs to happen is analyze needs to escape filenames itself.

Anyway, going back to the alleged contents of this filename buffer: it should be noted, lldb is treating this as a Str31 and showing 32 bytes regardless of the actual length of the string, so it's actually overrunning the key structure. The 0x02, 0x00, 0x00, and 0x00 before the file type and creator look like kHFSFileRecord followed by an empty flags field and unused byte. So this is consistent with an HFSCatalogKey followed by an HFSCatalogFile. I think that suggests we're peering across a node boundary.

So a couple more things we need to do here:

boredzo commented 5 months ago

I added some escaping to the output of analyze (and extract). Here's the filename extract is tripping over:

Failure in rehydrating descendant 📄 “\0\x01Startup Desktop”: Error Domain=NSCocoaErrorDomain Code=514 "Couldn't look up parent for destination path (null); check for typoes" UserInfo={NSLocalizedDescription=Couldn't look up parent for destination path (null); check for typoes}

It looks like the line that's failing is this one:

                NSURL *_Nonnull const fileURL = [realWorldURL URLByAppendingPathComponent:filename isDirectory:false];

realWorldURL and filename are both non-nil, but filename is @"\0\x01Startup Desktop", and NSURL isn't having this.

(To be fair, a lot of POSIX APIs down the stack will choke on that \0 as well.)

I still should verify somehow that impluse's interpretation of the catalog is correct.

boredzo commented 5 months ago

This is where the error comes from:

    FSRef parentRef, ref;
    bool const gotParentRef = CFURLGetFSRef((__bridge CFURLRef)realWorldURL.URLByDeletingLastPathComponent, &parentRef);
    if (! gotParentRef) {
        NSError *_Nonnull const noParentError = [NSError errorWithDomain:NSCocoaErrorDomain code:NSFileWriteInvalidFileNameError userInfo:@{ NSLocalizedDescriptionKey: [NSString stringWithFormat:NSLocalizedString(@"Couldn't look up parent for destination path %@; check for typoes", @""), realWorldURL.path] }];
        if (outError != NULL) {
            *outError = noParentError;
        }
        return false;
    }

This isn't about catalog back-traversal at all. This is the rehydration method trying to figure out what directory to put the file in, and there isn't one because the URL it's supposed to rehydrate the file at was nil.

boredzo commented 5 months ago

Mounting the disk in Mac OS 9 in SheepShaver and using AppleScript to list the contents of its System Folder, I find that Mac OS 9 also sees a file that AppleScript describes as "\□Startup Desktop". Comparing to other files in the list, it looks like the \ is how the NUL character gets represented, and the is any non-NUL control character.

So impluse is interpreting the catalog correctly. This filename really does contain a NUL character (and a control character). Which means we need to do something reasonable about that…

boredzo commented 5 months ago

If I convert the volume, both ls and modern AppleScript render the NUL character using . But that's a display affordance, not necessarily reflective of the actual name on disk.

Some tweaks to the converter confirm that the name is being copied across with an actual NUL character in it. The only other thing I could check is what the POSIX directory-traversal APIs (or the Cocoa ones layered on top) yield.