ansuz / RIIR

why not Rewrite It In Rust
https://transitiontech.ca/random/RIIR
641 stars 6 forks source link

Rewrite phar spec in Rust #60

Closed SOF3 closed 3 years ago

SOF3 commented 3 years ago

I was just trying to implement the PHP phar file format in Rust. I was vigilant with memory allocations, targeting to allow parsing phar files without even allocating O(num_files) memory, and performing all kinds of validations carefully.

Then I realized that not only does the phar format have no formal specification, but the official implementation is explicitly hacking.

https://github.com/php/php-src/blob/master/ext/phar/phar.c#L3030-L3045

TLDR: If the manifest length happens to be 256n + 10 or 256n + 13, it automatically adds one padding byte so that the little-endian-encoded manifest length doesn't start with a CR/LF byte, so that file stubs ending with __HALT_COMPILER(); ?> without the trailing \r\n bytes will still parse correctly.

What a terrible failure. An archive format that begins with source code? Fine enough. An archive format whose first section is terminated by a certain constant? Uh okay. An archive format whose first section is text, second section is binary, and the first section is terminated by multiple possible ways of writing a PHP end tag (?>), with multiple line ending conventions? r u srs? Robustness for stuff only to be created and parsed by computers not humans? This is the worst invention of the kind since SQL.

Apparently the phar file format even expects the parser to read the whole manifest into memory.

Rewriting phar implementation in rust is not enough to make it secure, because the phar file format is not secure to begin with. We have to rewrite the phar specification in rust.

progval commented 3 years ago

An archive format that begins with source code? Fine enough. An archive format whose first section is terminated by a certain constant? Uh okay.

You are going to love ZIP

SOF3 commented 3 years ago

An archive format that begins with source code? Fine enough. An archive format whose first section is terminated by a certain constant? Uh okay.

You are going to love ZIP

Well you can add a shebang line to a phar but not a zip, so it is still understandable to have a prefix that starts with source code. It's just the arbitrary choice of terminator that is ridiculous.

progval commented 3 years ago

You can add whatever you like at the beginning of a ZIP file, as the "header" (aka. End of central directory record) is at the end of the file. That's how self-extracting ZIP archives work.

And the beginning of the End of central directory record is also detected using an arbitrary constant, as it can have an arbitrary length