CodeSleeve / stapler

ORM-based file upload package for php.
http://codesleeve.com
Other
538 stars 144 forks source link

Fix MIME detection and too long URL names #167

Closed benallfree closed 8 years ago

benallfree commented 8 years ago

This PR fixes a condition where the file attachment would fail to save in the filesystem because the filesystem name was derived from the URL basename that was too long for the filesystem to support.

It also fixes a small issue where hash anchors in URLs would cause file extensions from pathinfo() to be incorrect

tabennett commented 8 years ago
  1. Change the $filePath to:
sys_get_temp_dir() . "/stapler.$name";
  1. Please change $ext to $extension (so that it's not abbreviated).

After that, this should be good to go.

benallfree commented 8 years ago
  1. The value of $name can be over 4k characters in some cases. My proposal is to use tempnam() to create a filesystem-compatible name that will always work, otherwise there are cases where $name, (derived from the URL basename) is too long to be written to the filesystem and an exception occurs.
  2. Fixed
tabennett commented 8 years ago

Awhile back I had to move away from using the tempnam function because in issue in #56 we discovered that the Reiszer class was leaving empty resized files on disk (these were created using tempnam). When I tried to refactor it to clean up the old files, I ran into issues where I couldn't get a full reference to file (or something like that) to delete it. For the life of me I can't remember now what that specific issue was (I swear it had something to do with how tempnam was naming the files in the temp directory or something), but I ended up having to create a function for generating random file paths that used the sys_get_temp_dir function so that I could get a good file name handle that I could delete later: https://github.com/CodeSleeve/stapler/commit/76d6fa48d263b70437ba92b7c2c053b79dbb5cc6

I said all that to say this; I don't particularly care what the file name is, I just want temp files to bel placed in a consistent location (throughout the entire app) so that they can then be cleaned up later. I wasn't able to make this work with tempnam.

benallfree commented 8 years ago

Could this be it? tempnam does create an empty physical file, and accepts a directory, but maybe there is something more I don't understand.

In any case, I don't care either, I can use the random function or what about

$name = md5($pathinfo['basename']);
$extension = isset($pathinfo['extension']) ? '.'.$pathinfo['extension'] : '';
$filePath = sys_get_temp_dir() . "stapler-{$name}{$extension}";

Then it's predictable if we ever need to garbage collect.

tabennett commented 8 years ago

Yeah that works for me.

benallfree commented 8 years ago

My only issue with it is the slim chance that another process is doing the same URL concurrently. See an SO discussion on tempnam.

benallfree commented 8 years ago

Hmm, I've been reading more about tempnam. It does a pretty important job to prevent concurrency collisions. It uses mkstemp under the hood, which is atomic because the next call will fail to create the file if it already exists. So you're guaranteed a unique filename at the OS level.

I think I'm going to leave this PR the way it is until there is a clearer reason not to use tempnam. I'm already maintaining a separate stapler branch with other PRs (#150, #102) that haven't been accepted yet, so there are other reasons for me to continue using my own fork.

tabennett commented 8 years ago

I don't think I explained what I was trying to say very well earlier. Let me try again. When you create a temporary file like this:

$filePath = tempnam(sys_get_temp_dir(), 'stapler-')."{$extension}";
file_put_contents($filePath, $rawFile);

You're actually creating two files:

/// this creates a file named /tmp/stapler-2wFEb (or something like that)
tempnam(sys_get_temp_dir(), 'stapler-')

// this then creates another file /tmp/stapler-2wFEb.jpeg (or whatever the extension is)
$filePath = tempnam(sys_get_temp_dir(), 'stapler-')."{$extension}";

Even though you're unlinking/removing one of these, it's still going litter the /tmp folder will the first file that's created from the call to tempnam. This would cause a regression bug that I've already fixed and I want to avoid that.

Regarding the use of tempnam to prevent races conditions, tempnam doesn't allow us to specify a file extension (which is really pretty damn awful) and because of that, I can't use it inside the Resizer class to save temporary image files (Imagine will choke if the file doesn't have a proper image file extension). The way you're using it (tacking on the extension at the end of the filename that's returned) is going to bypass any guarantee (provided by mkstemp at the OS level) that the file is unique during a race condition. This is still probably better than pseudo random file name generation (like we're doing in the Resizer class), but I don't think there's any guarantee that we still won't have the (possibility) of collisions.

I think that if you add a call to remove the first file created by tempnam than it should fix the issue with tmp files in the temp directory:

$tempFile = tempnam(sys_get_temp_dir(), 'stapler-');
$filePath = $tempFile . $extension;

// then later on
unlink($tempFile)
benallfree commented 8 years ago

Oh snap. I see what you mean now. Yes to removing the extra temporary file.

On the race condition point, yes not perfect, but if we treat tempnam's file as a lockfile, I think we will be safe enough. Strange oversight on the part of tempnam though.

See the new update.