Closed SOF3 closed 3 years ago
An archive format that begins with source code? Fine enough. An archive format whose first section is terminated by a certain constant? Uh okay.
You are going to love ZIP
An archive format that begins with source code? Fine enough. An archive format whose first section is terminated by a certain constant? Uh okay.
You are going to love ZIP
Well you can add a shebang line to a phar but not a zip, so it is still understandable to have a prefix that starts with source code. It's just the arbitrary choice of terminator that is ridiculous.
You can add whatever you like at the beginning of a ZIP file, as the "header" (aka. End of central directory record) is at the end of the file. That's how self-extracting ZIP archives work.
And the beginning of the End of central directory record is also detected using an arbitrary constant, as it can have an arbitrary length
I was just trying to implement the PHP phar file format in Rust. I was vigilant with memory allocations, targeting to allow parsing phar files without even allocating O(num_files) memory, and performing all kinds of validations carefully.
Then I realized that not only does the phar format have no formal specification, but the official implementation is explicitly hacking.
https://github.com/php/php-src/blob/master/ext/phar/phar.c#L3030-L3045
TLDR: If the manifest length happens to be
256n + 10
or256n + 13
, it automatically adds one padding byte so that the little-endian-encoded manifest length doesn't start with a CR/LF byte, so that file stubs ending with__HALT_COMPILER(); ?>
without the trailing\r\n
bytes will still parse correctly.What a terrible failure. An archive format that begins with source code? Fine enough. An archive format whose first section is terminated by a certain constant? Uh okay. An archive format whose first section is text, second section is binary, and the first section is terminated by multiple possible ways of writing a PHP end tag (
?>
), with multiple line ending conventions? r u srs? Robustness for stuff only to be created and parsed by computers not humans? This is the worst invention of the kind since SQL.Apparently the phar file format even expects the parser to read the whole manifest into memory.
Rewriting phar implementation in rust is not enough to make it secure, because the phar file format is not secure to begin with. We have to rewrite the phar specification in rust.