Closed benallfree closed 8 years ago
$filePath
to:sys_get_temp_dir() . "/stapler.$name";
$ext
to $extension
(so that it's not abbreviated).After that, this should be good to go.
$name
can be over 4k characters in some cases. My proposal is to use tempnam()
to create a filesystem-compatible name that will always work, otherwise there are cases where $name
, (derived from the URL basename) is too long to be written to the filesystem and an exception occurs.Awhile back I had to move away from using the tempnam
function because in issue in #56 we discovered that the Reiszer class was leaving empty resized files on disk (these were created using tempnam). When I tried to refactor it to clean up the old files, I ran into issues where I couldn't get a full reference to file (or something like that) to delete it. For the life of me I can't remember now what that specific issue was (I swear it had something to do with how tempnam
was naming the files in the temp directory or something), but I ended up having to create a function for generating random file paths that used the sys_get_temp_dir
function so that I could get a good file name handle that I could delete later: https://github.com/CodeSleeve/stapler/commit/76d6fa48d263b70437ba92b7c2c053b79dbb5cc6
I said all that to say this; I don't particularly care what the file name is, I just want temp files to bel placed in a consistent location (throughout the entire app) so that they can then be cleaned up later. I wasn't able to make this work with tempnam
.
Could this be it? tempnam
does create an empty physical file, and accepts a directory, but maybe there is something more I don't understand.
In any case, I don't care either, I can use the random function or what about
$name = md5($pathinfo['basename']);
$extension = isset($pathinfo['extension']) ? '.'.$pathinfo['extension'] : '';
$filePath = sys_get_temp_dir() . "stapler-{$name}{$extension}";
Then it's predictable if we ever need to garbage collect.
Yeah that works for me.
My only issue with it is the slim chance that another process is doing the same URL concurrently. See an SO discussion on tempnam.
Hmm, I've been reading more about tempnam
. It does a pretty important job to prevent concurrency collisions. It uses mkstemp
under the hood, which is atomic because the next call will fail to create the file if it already exists. So you're guaranteed a unique filename at the OS level.
I think I'm going to leave this PR the way it is until there is a clearer reason not to use tempnam
. I'm already maintaining a separate stapler
branch with other PRs (#150, #102) that haven't been accepted yet, so there are other reasons for me to continue using my own fork.
I don't think I explained what I was trying to say very well earlier. Let me try again. When you create a temporary file like this:
$filePath = tempnam(sys_get_temp_dir(), 'stapler-')."{$extension}";
file_put_contents($filePath, $rawFile);
You're actually creating two files:
/// this creates a file named /tmp/stapler-2wFEb (or something like that)
tempnam(sys_get_temp_dir(), 'stapler-')
// this then creates another file /tmp/stapler-2wFEb.jpeg (or whatever the extension is)
$filePath = tempnam(sys_get_temp_dir(), 'stapler-')."{$extension}";
Even though you're unlinking/removing one of these, it's still going litter the /tmp folder will the first file that's created from the call to tempnam
. This would cause a regression bug that I've already fixed and I want to avoid that.
Regarding the use of tempnam
to prevent races conditions, tempnam
doesn't allow us to specify a file extension (which is really pretty damn awful) and because of that, I can't use it inside the Resizer
class to save temporary image files (Imagine will choke if the file doesn't have a proper image file extension). The way you're using it (tacking on the extension at the end of the filename that's returned) is going to bypass any guarantee (provided by mkstemp at the OS level) that the file is unique during a race condition. This is still probably better than pseudo random file name generation (like we're doing in the Resizer class), but I don't think there's any guarantee that we still won't have the (possibility) of collisions.
I think that if you add a call to remove the first file created by tempnam
than it should fix the issue with tmp files in the temp directory:
$tempFile = tempnam(sys_get_temp_dir(), 'stapler-');
$filePath = $tempFile . $extension;
// then later on
unlink($tempFile)
Oh snap. I see what you mean now. Yes to removing the extra temporary file.
On the race condition point, yes not perfect, but if we treat tempnam
's file as a lockfile, I think we will be safe enough. Strange oversight on the part of tempnam
though.
See the new update.
This PR fixes a condition where the file attachment would fail to save in the filesystem because the filesystem name was derived from the URL basename that was too long for the filesystem to support.
It also fixes a small issue where hash anchors in URLs would cause file extensions from
pathinfo()
to be incorrect