getkirby / kirby

Kirby's core application folder
https://getkirby.com
Other
1.27k stars 167 forks source link

Doesn't clear `tmp` when uploading files #2476

Closed texnixe closed 1 month ago

texnixe commented 4 years ago

Describe the bug
When you upload a file through PHP, it is usually temporarily stored in the /tmp folder until the script is finished. Since Kirby's upload method renames each file on upload, however, files pile up in the /tmp folder until manually removed. This can easily end up in people no longer being able to upload files due to space issues.

To Reproduce
Steps to reproduce the behavior:

  1. Upload a file via the Panel.
  2. Open the /tmp folder.
  3. See how the uploaded file ends up in the /tmp folder and is not removed when the upload is finished.

Expected behavior
Files should either not be renamed on upload or removed again afterward.

Kirby Version
Tested with 3.3.4

Additional context
https://forum.getkirby.com/t/uploading-images-keep-a-copy-of-the-original-in-phps-tmp-folder/17469/2

afbora commented 4 years ago

In the lines written as the source of this issue, @bastianallgeier wrote the following comment:

move the file to a location including the extension for better mime detection

https://github.com/getkirby/kirby/blob/3.3.4/src/Api/Api.php#L764-L770

These lines provides following process:

/tmp/php8212.tmp > /tmp/5e5c24f38420d.abba.jpg

texnixe commented 4 years ago

These lines provides following process:

/tmp/php8212.tmp > /tmp/5e5c24f38420d.abba.jpg

Yes, we have to check if this is necessary. If not, we should remove this step. If yes, we have to make sure the /tmp folder is cleaned up again.

lukasbestle commented 4 years ago

I‘d also prefer if there was a way to avoid the renaming. If not, we should add an instance var that keeps track of the files and a __destruct() method that deletes the files from the list.

afbora commented 4 years ago

I just wonder that we already check and get the uploaded file mime type from tmp_name naturally in previous lines. Why do we want to be sure again? 🤔

https://github.com/getkirby/kirby/blob/3.3.4/src/Api/Api.php#L757

lukasbestle commented 4 years ago

@bastianallgeier?

bastianallgeier commented 4 years ago

Sorry for the late response. The mime detection sometimes failed on those tmp files. Don't ask me why. That's why it was more secure to rename the file and then check again if it actually has the mime it pretends to have.

lukasbestle commented 4 years ago

@bastianallgeier OK, that makes sense. But what I don't understand is where the second MIME check happens. The callback is called directly after the renaming takes place.

The reason why MIME type detection always works with the correct extension is probably our explicit fallback. So what renaming the file does is to make our Mime class fall back to the guessed MIME type based on the user-provided filename if it doesn't know the real type from the file contents (which will happen for all text files and other file types that don't have a magic number).

If we decide that we will continue to accept files where the MIME type cannot be reliably determined (which is what the current implementation does by relying on the user-provided data in this case), the solution would be to directly get the MIME type by extension if the automatic detection returns false. The renaming would then not be necessary anymore.

bastianallgeier commented 4 years ago

The second check is done in the *Rules.php classes. If I remember correctly Firefox was the main offender in this case. It often does not reliably send the correct MIME type on uploads. It depends on the system setup but we had the issue in v2 and v3. I'm not sure if the MIME detection via extension is a good idea because of that.

lukasbestle commented 4 years ago

We cannot rely on the MIME type the browser sends us as that can be faked by attackers, so detecting it ourselves (which we do at the moment) is already a pretty good solution. But as I wrote above: If PHP cannot reliably determine the MIME type from the file contents, we need to decide if we accept the file anyway or if we block it.

Currently we accept it anyway, in which case we could rewrite the uploader code so that it no longer needs to rename the temporary file.

distantnative commented 3 years ago

I'm trying to track all necessary changes down, but am getting a bit lost:

But do we even need this whole part in the upload method: https://github.com/getkirby/kirby/blob/master/src/Api/Api.php#L771-L782?

If we start later on mime checking again as well? Wouldn't it make sense to all move that to the rules class and/or Kirby\Filesystem\Mime?

lukasbestle commented 3 years ago

Remove https://github.com/getkirby/kirby/blob/master/src/Api/Api.php#L784-L790 and pass $upload['tmp_name'] instead of $source to the callback

Sounds good.

But do we even need this whole part in the upload method: https://github.com/getkirby/kirby/blob/master/src/Api/Api.php#L771-L782?

Yes, because we pass the resulting $filename to the callback as the second argument. I think this ends up as the final filename after the upload.

If we start later on mime checking again as well? Wouldn't it make sense to all move that to the rules class and/or Kirby\Filesystem\Mime?

The Mime::type() method already supports an optional $extension argument that allows to override the fallback extension. So to replicate the current behavior without the physical renaming process we "only" need to pass the extension from $filename to Mime::type() when it is called from Filesystem\File::mime().

Idea: What if we add a new prop Filesystem\File::$filename. By default that would be null (= extract the filename from the $root). But it could be set to something completely different for use in the extension, filename and mime methods.

Example:

$file = new BaseFile([
  'root'     => '/tmp/example.tmp',
  'filename' => 'my-image.jpg'
]);

$file->root();      // /tmp/example.tmp
$file->filename();  // my-image.jpg
$file->name();      // my-image
$file->extension(); // jpg
$file->mime();      // uses $this->extension() when calling `Mime::type()`

If we have this, we can just set this filename prop here and here and everything will still work even without physical renaming.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

distantnative commented 1 year ago

@lukasbestle I am not too comfortable with the filename prop solution as I think ti could lead to weird other cases where we have a class that states the filename is A while it actually isn't.

I looked at the code again, which also has changed it bit since last year. If I still understand correctly, the issue here is that files are left in tmp, not necessarily how they are named (just that renaming prevents PHP's automatic deletion). But why are we then not using F::move() instead F::copy() here: https://github.com/getkirby/kirby/blob/main/src/Cms/FileActions.php#L212? Shouldn't that resolve our issue of the leftover files?

lukasbestle commented 1 year ago

But why are we then not using F::move() instead F::copy() here: https://github.com/getkirby/kirby/blob/main/src/Cms/FileActions.php#L212? Shouldn't that resolve our issue of the leftover files?

I'd say moving would be better than the current implementation, but it won't fix the root cause of the issue (which is that the file gets moved to another file inside the temp directory at all; this isn't how move_uploaded_file() is supposed to be used).

This means that uploaded files will be moved from the temp dir if they are accepted by FileRules, but they will still remain in the temp dir if they are rejected by FileRules. This could actually be abused by Panel users to fill up the temp dir with large files that are not allowed by the blueprint.

I think we can only solve this issue properly by getting rid of the intermediate temporary file. To do that, we need some way of passing down the target filename to the $upload object (see below).

I am not too comfortable with the filename prop solution as I think ti could lead to weird other cases where we have a class that states the filename is A while it actually isn't.

Nico and I discussed this directly. It is too risky to have this "fake" prop in the core Filesystem\File class (could be used incorrectly quite easily). We came up with a few ideas (see below).


Issue summary

Status quo

Our FileRules class performs a MIME check on the uploaded file before it is copied to the final destination:

https://github.com/getkirby/kirby/blob/7dcf359e3da2fa09923a3e0d851f2a74ae668e78/src/Cms/FileRules.php#L113

The same happens for file replacing:

https://github.com/getkirby/kirby/blob/7dcf359e3da2fa09923a3e0d851f2a74ae668e78/src/Cms/FileRules.php#L152

Because PHP creates the temporary uploaded file without the user-provided file extension, this MIME check will fail for every file type that doesn't have "magic bytes" (so basically for every text-based file format). For those files we need the file extension in the MIME check as a fallback.

Currently we solve this by renaming the temporary uploaded file to contain the file extension:

https://github.com/getkirby/kirby/blob/7dcf359e3da2fa09923a3e0d851f2a74ae668e78/src/Api/Api.php#L691-L700

Because we use move_uploaded_file(), PHP will assume that the file was moved to the final destination. So it isn't cleaned up afterwards.

General solution

I think the only proper solution can be to get rid of that intermediate temp file. All the validation should be done on the original uploaded file.

To achieve that, we IMO need to make the following changes:

This will mean that the following happens on upload:

  1. The upload routes (file upload, file replace, user avatar upload/replace) and the field upload mixin would each receive the original file as the $source (new). They would still receive the "clean" filename as $filename (like before).
  2. In the "create" case:
    1. They pass the $source and $filename variables as props to HasFile::createFile().
    2. HasFiles::createFile() passes these props 1:1 to FileActions::create().
  3. In the "replace" case:
    1. They pass the $source to $file->replace().
  4. FileActions::create() and $file->replace() somehow create an $upload object that uses the $filename as an override (new but still uncertain, see below).
  5. FileActions::create() and $file->replace() call $file->commit(), which calls $fileRules->create()/$fileRules->replace() with that $upload object.
  6. The FileRules methods call the $upload->mime(), $upload->match() and $upload->validateContents() methods. The replace rule method also calls the $upload->extension() method.
  7. All of those $upload methods can now refer to the target filename (new).
  8. If everything passes validation, the file is moved to the destination using F::move() in FileActions::create()/$file->replace().
  9. Otherwise, PHP automatically cleans up the temporary file (new).

Possible solutions for Filesystem\File with custom $filename prop

1. New Filesystem\TmpFile class

We would have a new class Filesystem\TmpFile extends File that would add the special $filename prop. This class could then override the filename(), extension(), is() and name() methods.

Also we would have to rewrite the following methods inside Filesystem\File: The mime() and type() methods would use the faked extension() internally. The sanitizeContents() and validateContents() methods would use the faked extension() and mime() internally.

Unsolved issue: There would be no way to use TmpFile for an Image object directly. For the upload use case, we'd at least need the match() method to use Image::$validations if the upload is an image.

2. TmpFile with an override for match() that supports Image

We could override the match() method so that it accesses Image::$validations if the fake $filename is of type image.

Unsolved issue: The TmpFile class would only be useful for upload handling in the core, but not generally useful:

3. TmpFile as a proxy class

To solve the access to all Image methods, we could make TmpFile a proxy class that receives the underlying File or Image object as well as the $filename.

Every method call that isn't overridden would be proxied to the asset object.

Unsolved issue: The overridden methods in the proxy won't be available to the underlying class, so we would need to override many of the underlying methods in the proxy even though their code would be unchanged.

Conclusion

We are a bit stuck at the moment.

The second potential solution for TmpFile looks to be the most promising so far, but it's not perfect either.

Edit 30.12.2022: Updated summary to the state after merging #4943

lukasbestle commented 1 year ago

We partly fixed the issue for 3.9.0. The uploaded file is cleaned from the tmp dir on successful uploads, but so far the file stays in tmp on failed uploads (see the issue summary above).

distantnative commented 1 month ago

I think with https://github.com/getkirby/kirby/pull/6590 we have covered the majority of cases and all that we can cover without a lot of effort. Closing this/