Majored / rs-async-zip

An asynchronous ZIP archive reading/writing crate.
MIT License
129 stars 44 forks source link

crypak / p4k file / zip64 #13

Closed damccull closed 1 year ago

damccull commented 2 years ago

So, I realize this isn't in the realm of responsibility for a zip archive library, but I am trying to open a p4k file (from star citizen) using your lib and getting an unexpected header. My research indicates (even from cryengine's own developer docs) that a crypak/p4k file should just be a zip with some zstd or deflate compression and some files encrypted while others are not even compressed. From what I have been able to find, I should be able to just list the files with standard zip capability, and yet I am getting this unexpected header before I even try to extract.

    Finished dev [unoptimized + debuginfo] target(s) in 4.70s
     Running `target\debug\scprospector.exe`
[src\lib.rs:6] &file = tokio::fs::File {
    std: File {
        handle: 0x00000000000000dc,
        path: "\\\\?\\C:\\Users\\David\\repos\\scprospector\\Data.p4k",
    },
}
file open; trying to open as zip
Error: Encountered an unexpected header (actual: 0x99df8764, expected: 0x2014b50).
error: process didn't exit successfully: `target\debug\scprospector.exe` (exit code: 1)

Looking at some other open source code that successfully opens the file in dotnet and another in python, those authors have some kind of encryption key they seem to open it with. Maybe this is happening because async_zip doesn't support encryption yet? I'm trying to find out what kind of encryption it is. Maybe ZipCrypt, but I'm not sure.

Obviously this isn't something I expect you to solve, but I wondered if you have any insight into why this might happen. If not, that is ok too. Thanks!

Majored commented 2 years ago

Would you be comfortable sending over the p4k file you're using so I could have a quick look? Obviously don't bother if it's got anything personal in it.

Having a good look as well there's not really a clear answer. It's definitely using the ZIP format so it should be possible, but it's not clear if the encryption is on the whole archive or just individual files. If it is just individual files, you shouldn't be getting that specific unexpected header error so it could equally be a bug.

damccull commented 2 years ago

Howdy. The Data.p4k in question is from the Star Citizen game client. I've uploaded a copy to share a link to OneDrive since it's about 77GB (huge!). I wasn't sure github could handle it. Obviously if you have SC you can just look at it directly, but I'll send the link as soon as the upload is done. I'll be planning to remove it after you get a copy since I'm not sure about public distribution beyond CIG's client.

damccull commented 2 years ago

@Majored Here's the link to my problematic Data.p4k, https://1drv.ms/u/s!AidfnSaORZHWnP0hDue-xEeMGYB_wQ?e=d6Y01s

Again, no rush or expectations, but it would be cool for your lib to open this. Or at least for me to learn a bit about it.

Majored commented 2 years ago

All downloaded, thanks.

Will look into it in a few hours.

damccull commented 2 years ago

Some minimal forensics I've tried to do:

Reading the cryengine docs on crypak, which I believe the p4k file is, this should just be a standard zip file with its contents stored in deflate, zstd, or STORED, and sometimes encrypted with crypak standard key.

However, when I use several freely available zip programs to view or analyze the file, I get one of the following:

with p4k extension, it opens and displays 1 folder and 5 files, all related to a starfarer crash

with .zip extension it fails to open entirely

examining with 7z t as a p4k file it claims the type is zip but that it has extra data after the end of the archive

examining with the same command as zip extension claims it is not an archive

The zipdetails command on linux returns the following with either extension:

0000000000 00FFFFFFFF ...         PREFIX DATA

Unexpecded END at offset FFFFFFFF, value 99DF8764
Done

Looking at the file in HxD shows offset FFFFFFFF at around 5-10% of the total file.

Majored commented 2 years ago

It's using the ZIP64 spec extension so that would explain the 0xFFFF....'s, since the spec sets the unused fields in the end of central directory header to that. So the first step would be supporting ZIP64 in the first place for this crate, which doesn't seem too bad honestly.

The one thing I am confused about with that p4k file is why the local file headers are using 0x14034B50 as the signature and not 0x04034B50. I assume it's to signify the difference between a normal vs zip64 header, but I can't see any mention of it in the spec - so not sure if that's specific to p4k files or not.

damccull commented 2 years ago

I don't suppose you'd mind explaining how you figured out it's zip64? :D I'm new to zip file internals and am just figuring it out.

I'm not sure about the signature, to be honest. And, please, don't let other work on your library suffer for my questions. I'm patient, and what I want to do with this is not a huge priority as some C# tools already exist to open this.

I do want to eventually open it in rust, but low priority. I suppose if I really want to learn about zip, I need to begin by reading the spec. I'll get on that.

Majored commented 2 years ago

Just from looking at the end of the file and noticing that there were two PK zip signatures when a normal ZIP would only have one (the end of central directory header): https://gyazo.com/8ff274844a672fcb368bfb0466ebad67

Searching the signature up (converted from little to big endianness), the first one relates to the zip64 end of central directory locator. https://github.com/Majored/rs-async-zip/blob/main/SPECIFICATION.md#4315

And you're all good regarding questions. I'm happy to help plus I definitely want to support ZIP64 down the line so doing a bit of digging now doesn't hurt.

damccull commented 2 years ago

Holy cow. Your markdown version is so much easier to read than the official pkware APPNOTE.txt file. I'mma use this as my learning reference. Thanks for all the info and help. If I can figure out the spec in my brain maybe I'll help with a PR or two. I'm kind of an amateur though so I'm not sure it'll be anything impressive :)

skairunner commented 1 year ago

Just want to note that I've started working on implementing Zip64 reading and writing.

Majored commented 1 year ago

I'll go ahead and close this as a majority of the Zip64 implementation has been added. I assume there might be a few rough areas or improvements that can be made so it'll be best to open any separate issues for those.

Thanks again for your work on this @skairunner.

damccull commented 5 months ago

Thanks for working on this! Sorry it's taken me so long to see this and reply. I've been offline for a while with life stuff going on. I'll open up my project again and see if this gets me where I need to be, and then I'll actively appreciate the work you both did.