Blobfolio / righteous-mimes

A comprehensive MIME and file extension tool for PHP. Finally!
Do What The F*ck You Want To Public License
2 stars 0 forks source link

Righteous MIMEs!

BETA WARNING: This software is close but not yet ready for production use!

PHP has a file type detection problem. Extensions like FileInfo and ID3 rely on static data which is often incomplete, stale, or simply wrong, and their deducations frequently vary from method to method and server to server.

Righteous MIMEs! is a lightweight, stand-alone PHP library that augments PHP's native type detection capabilities (i.e. fileinfo.so) with numerous type-specific workarounds, extra magic parsing, and extensive type alias cross-referencing.

By increasing PHP's overall type awareness more than a magnitude, Righteous MIMEs! is able to make deductions about file types that are more accurate, complete, and consistent.

The table below explains the process in more detail:

Native PHP Righteous MIMEs!
:skull: :see_no_evil: :skull: :mag: :surfer:

 

Table of Contents

  1. Requirements
  2. Installation
  3. Building
  4. Reference
  5. Bug/Type Reporting

 

Requirements

Righteous MIMEs! requires PHP 7.3+ compiled with the following extensions (all of which are quite common):

While Righteous MIMEs! can technically be used on its own, it is highly recommended you add getID3 to your project as it allows RM! to fix a few additional type detection issues related to MP4 and OGG media.

If you're building atop a CMS like WordPress, just be careful not to override any bundled versions of getID3 that might already be present (i.e. stick with their copy).

 

Installation

For most use cases, it is recommended to install Righteous MIMEs! using Composer:

# Assuming you want the latest and greatest "master" branch:
composer require "blobfolio/righteous-mimes:dev-master"

If you're doing something weird or want to integrate the library manually, all of the important files live inside the lib/righteous/ directory.

The meat of Righteous MIMEs! is also available as a WordPress plugin called Lord of the Files. If you're just looking to fix Media Library upload issues like #40175, this plugin is your best bet!

 

Reference

This library comes with four main class files you may wish to interact with:

Additional classes and methods exist, but are subject to change so it is not recommended you rely on them directly. But that said, if you find something useful you wish were stable, open a ticket and we'll consider promoting its status. :wink:

 

Definitions

Righteous MIMEs!, like every other major type-detection suite, takes a multi-tiered approach to file analysis that breaks down roughly into two categories: naive and magic.

Naive analysis gathers all of the information it can using only a file's path. For example, a file named "image.jpg" uses the jpg file extension, which is primarily associated with the registered image/jpeg media type.

Of course, just because a file happens to be called "image.jpg" doesn't mean it actually is a valid JPEG file, but it's a good place to start.

Magic analysis, by contrast, looks for clues within a file's content. It is called "magic" because it sounds cool the correct answer can be arrived at even in cases where a file has the wrong extension or no extension at all, as if by magic! Equally impressive, type determinations can usually be made after reading a small percentage of the total file, keeping things nice and efficient.

Magic analysis, like magic in general, is not infallable, but is better than nothing.

 

Working With Files

The heart of Righteous MIMEs! revolves around its tiered file analysis capabilities, all of which live within the Righteous\MIMEs\File class.

The following instance methods are available:

 

File->__construct(string $path) : bool

The main magic of Righteous MIMEs! sits behind the Righteous\MIMEs\File class. All you need to do is instantiate an object with a string path, then use the relevant class methods to extract the information you want.

Parameters

Type Description
string A file path.

Any sort of path-like value will do, but Righteous MIMEs! can only work with what it's given. Information about remote, fragmentary, or unreadable paths will be based entirely on naive deductions (i.e. the file name).

Returns

This method returns true if any information whatsoever was discovered, or false on complete and utter failure.

Example

if (false !== ($file = new \Righteous\MIMEs\File('/path/to/IMAGE.JPG'))) {
    // Do something with it.
    if ('image/jpeg' === $file->type()) {
        …
    }
}

 

File->info() : ?array

This is a catch-all method that delivers a lot of information in one go. In many ways it resembles PHP's native pathinfo() method, but there are a few differences worth noting.

First and foremost, Righteous MIMEs! and PHP qualify "filename" and "extension" differently. RM! believes file extensions follow file names (not just periods), and because it also knows what a valid extension looks like, is able to sanitize and normalize the value.

This will probably make more sense with some examples:

File Key File->info() pathinfo()
".htaccess" filename ".htaccess" ""
".htaccess" extension "" "htaccess"
"IMAGE.JPEG" filename "IMAGE" "IMAGE"
"IMAGE.JPEG" extension "jpeg" "JPEG"

This method also includes two additional keys: type and valid.

Returns

This method returns null on failure, or an array in the following format:

Type Key Description
string dirname The parent directory.
string basename The path's base name.
string filename The file name (minus extension).
string extension The file extension (lowercase).
string type The file type.
bool valid true if the extension matches the type, false otherwise.

Example

$file = new \Righteous\MIMEs\File('wolf.jpg');
if (null !== $info = $file->info()) {
    …
}

 

File->basename(bool $suggested = false) : ?string

Return the path's (normalized) base name, like PHP's native basename() method.

By default, this method returns the base name corresponding to the file's actual path, but if true is passed, the best base name — based on media type — is returned instead.

Parameters

Type Description Default
bool Use the best, suggested value instead of the naive one. false

Returns

This returns the base name as a string or null if the path is invalid.

Example

// Say you have a PNG image incorrectly named "wolf.jpg".
$file = new \Righteous\MIMEs\File('wolf.jpg');

echo $file->basename(); //-> "wolf.jpg"
echo $file->basename(false); //-> "wolf.jpg"
echo $file->basename(true); //-> "wolf.png"

 

File->dirname() : ?string

Return the path's parent directory exactly like PHP's native dirname() method does.

Returns

This returns the parent directory as a string or null if the path is invalid.

Example

$file = new \Righteous\MIMEs\File('wolf.jpg');
echo $file->dirname(); //-> "."

$file = new \Righteous\MIMEs\File('/tmp/working/presentation.docx');
echo $file->dirname(); //-> "/tmp/working"

 

File->extension(bool $suggested = false) : ?string

Return either the path's current extension, or if true is passed, the most appropriate extension given the content type (which may or may not be the same thing).

Parameters

Type Description Default
bool Use the best, suggested value instead of the naive one. false

Returns

This returns the extension as a string or null if the path is invalid.

Note: the formatting of the return values may not be what you expect. See File->info() for additional information.

Example

// Say you have a PNG image incorrectly named "wolf.jpg".
$file = new \Righteous\MIMEs\File('wolf.jpg');

echo $file->extension(); //-> "jpg"
echo $file->extension(false); //-> "jpg"
echo $file->extension(true); //-> "png"

 

File->filename() : ?string

Return the path's file name (minus extension).

Returns

This returns the file name (minus extension) as a string or null if the path is invalid.

Note: the formatting of the return values may not be what you expect. See File->info() for additional information.

Example

$file = new \Righteous\MIMEs\File('wolf.jpg');
echo $file->filename(); //-> "wolf"

 

File->suggested() : ?array

This method suggests base names based on the (naive) file name and (magic) content type. The array keys are the base names and the values are bitwise integers representing the sources that agree with the result.

See Extensions::source() for more information about source values.

Returns

If the type and extension are already in agreement, the current value (and its source) are returned, otherwise suitable alternatives arranged by descending levels of certainty, if any, are returned. On failure, null is returned instead.

Example

// Say you have a JPEG image incorrectly named "wolf.png".
$file = new \Righteous\MIMEs\File('wolf.png');
\print_r($file->suggested());
/*
    "wolf.jpg": 252,
    "wolf.jpeg": 212,
    "wolf.jpe": 3,
    …
*/

 

File->type() : ?string

Return the best media type associated with a file.

Returns

This returns the media type as a string or null if the path is invalid.

Example

$file = new \Righteous\MIMEs\File('wolf.jpg');
echo $file->type(); //-> "image/jpeg"

 

General Helpers

Righteous MIMEs! includes a number of useful methods for more general tasks like formatting and sanitization.

 

Extensions::primary_type(string $ext) : ?string

Return the primary MIME type associated with a given file extension.

Parameters

Type Description
string A file extension.

Returns

Returns a MIME type as a string or null if none comes to mind.

Example

$type = \Righteous\MIMEs\Extensions::primary_type('jpg'); //-> "image/jpeg"

 

Extensions::source(string $ext, string $type) : int

Return a bitwise integer reflecting the source(s) that reference a relationship between a given extension and type.

The following source constants are defined in the Righteous\MIMEs class:

Constant Description License Link
SOURCE_ALIAS This indicates an association should only be used for cross-referencing purposes (because it is an alias).
SOURCE_APACHE Apache. Apache 2.0 Data
SOURCE_BLOBFOLIO Our own data! WTFPL
SOURCE_DRUPAL Drupal GPL Data
SOURCE_FREEDESKTOP FreeDesktop.org. MIT Data
SOURCE_IANA IANA. Misc Data
SOURCE_NGINX Nginx. BSD-2 Data
SOURCE_TIKA Apache "Tika". Apache 2.0 Data
SOURCE_WORDPRESS WordPress GPLv2 Data

Parameters

Type Description
string A file extension.
string A MIME type.

Returns

This method always returns an integer. A value of 0 indicates no primary source references.

Example

$source = \Righteous\MIMEs\Extensions::source('jpg', 'image/jpeg'); //-> 252

// Apache mentions it.
if (\Righteous\MIMEs::SOURCE_APACHE & $source) {
    …
}

// IANA mentions it.
if (\Righteous\MIMEs::SOURCE_IANA & $source) {
    …
}

// Etc.

 

Extensions::verify_extension_type(string $ext, string $type) : bool

Determine whether or not a given file extension and media type belong together.

Parameters

Type Description
string A file extension.
string A MIME type.

Returns

A value of true is returned if the file extension and media type belong together, otherwise false.

Example

\Righteous\MIMEs\Extensions::verify_extension_type(
    'jpg',
    'image/jpeg'
); //-> true

\Righteous\MIMEs\Extensions::verify_extension_type(
    'jpeg',
    'image/jpeg'
); //-> true

\Righteous\MIMEs\Extensions::verify_extension_type(
    'jpe',
    'image/jpeg'
); //-> true

\Righteous\MIMEs\Extensions::verify_extension_type(
    'png',
    'image/jpeg'
); //-> false

 

Sanitize::extension(string $ext, int $flags) : ?string

Sanitize a file extension, ensuring it consists of valid characters and is in a neutral lowercase.

This method can also be used to parse a file's extension from a full path or name, though you should read the notes for File->info() as there are a few quirks to consider.

Parameters

Type Description Default
string A path, file name, or extension.
int One or more bitwise filter flags. 0

Flags

The following filter constants are defined in the Righteous\MIMEs class:

Constant Description
FILTER_NO_UNKNOWN Reject any extension for which we have no references whatsoever.

Returns

This returns a normalized and sanitized file extension as a string or null if none.

Example

echo \Righteous\MIMEs\Sanitize::extension('IMAGE.JPEG'); //-> "jpeg"

echo \Righteous\MIMEs\Sanitize::extension('png'); //-> "png"

echo \Righteous\MIMEs\Sanitize::extension('fakeo'); //-> "fakeo"

echo \Righteous\MIMEs\Sanitize::extension(
    'fakeo',
    \Righteous\MIMEs::FILTER_NO_UNKNOWN
); //-> null

 

Sanitize::type(string $type, int $flags) : ?string

Sanitize a file/media/MIME type, ensuring it is formatted correctly, contains only valid characters, etc.

Parameters

Type Description Default
string A MIME type.
int One or more bitwise filter flags. 0

Flags

The following filter constants are defined in the Righteous\MIMEs class:

Constant Description
FILTER_NO_ALIAS Reject unknown, unofficial, or outdated media types.
FILTER_NO_DEFAULT Reject application/octet-stream.
FILTER_NO_EMPTY Reject inode/x-empty.
FILTER_NO_UNKNOWN Reject any type for which we have no references whatsoever.
FILTER_UPDATE_ALIAS Replace an unofficial type with an official one whenever possible.

Note: when FILTER_UPDATE_ALIAS is combined with FILTER_NO_ALIAS, replacement will be attempted first, and evaluation second.

Returns

This returns a normalized and sanitized media type as a string or null if none.

Example

echo \Righteous\MIMEs\Sanitize::type('image/x-bmp'); //-> "image/x-bmp"

echo \Righteous\MIMEs\Sanitize::type(
    'image/x-bmp',
    \Righteous\MIMEs::FILTER_NO_ALIAS
); //-> null

echo \Righteous\MIMEs\Sanitize::type(
    'image/x-bmp',
    \Righteous\MIMEs::FILTER_UPDATE_ALIAS
); //-> "image/bmp"

 

Bug/Type Reporting

MIME type detection is an endless game of cat and mouse, and your help is needed!

If you ever happen to find instances where an up-to-date Righteous MIMEs! incorrectly identifies a file types (or does something silly like suggest it be renamed), please open a ticket and report the issue.

Thank you very much!