expressjs / multer

Node.js middleware for handling `multipart/form-data`.
MIT License
11.61k stars 1.06k forks source link

Should req.files be an array? #133

Closed LinusU closed 9 years ago

LinusU commented 9 years ago

I'm really unsure of this myself, but I wanted to get the discussion started. Should req.files be an array?

It would certainly help #59 since there then would be only one array to check. I also think that the most common case is having one file upload which then would be really easy to find, req.files[0].

The downside would be pages that has multiple files with different fieldnames, it would be a bit more complicated to find a specific file.

To mitigate that we could expose a byFieldname(fieldname, multiple) function. fieldname would obviously be the filename and multiple would be a boolean on wether to allow multiple files or not. It would default to false.

If multiple is false and there is more than one file uploaded to that fieldname, I think that it should throw an error.

On top of this we could also expose a byMimetype(mimetype, multiple) function. That could help with sites that allows uploading any files, and then want to treat music, pictures or movies differently.

Although I feel that we are maybe providing a bit too much with the functions. If req.files is just a normal array, it's very easy to do this with both built in functions, and already popular libraries.

examples

// Find all the new photos that should be added to the album
req.files.filter(function (file) {
  return (file.fieldname === 'photo')
})
// Check if the user has uploaded a new profile picture
_.find(req.files, function (file) {
  return (file.fieldname === 'profile-picture')
})
// Convert all audio files to ogg/vorbis
req.files.filter(function (file) {
  return is.is(file.mimetype, ['audio/*'])
})
jpfluger commented 9 years ago

I like the function ideas.

Re req.files as array--- Maybe have both? Maybe keep the array but then add a pointer to it within the actual json req.body. eg. if appendField(req.body, fieldname, FILE-INDEX-POINTER) were called within the busboy.on('file'.... portion of code. Doesn't this then get fieldnames with file types in the right position according to the w3c specs? (I was just looking at Example 9 - not certain of the details).

The reason I suggest this is that it's always been a little cumbersome to consume ORM-generated forms/data, when the form's fields have been moved from its normal positioning (eg within req.body) to an outside structure (eg req.files)

LinusU commented 9 years ago

Hmm, that would be quite cool actually!

I kind of like the name, type, body as well, although that only works with MemoryStorage. Not super relevant to this topic but I think we should rename buffer to body in the MemoryStorage.

Although the thing I don't like is almost following standards. And it might open up to possible attack vector, imagine the following data:

form.append('profilepicture', file('test.jpg'))
form.append('profilepicture[path]', '/etc/passwd')
LinusU commented 9 years ago

What do you think of the following?

jpfluger commented 9 years ago

Let's try it.

LinusU commented 9 years ago

Pushed as 6ef893b :)

jpfluger commented 9 years ago

Ok, I've tried req.files and don't like it as an array. It just seems more intuitive and less breaking to do req.files[fieldName], where req.files[fieldName] = array of field objects associated to that fieldName.
The fileFilter function handles issue #59, but in addition to it I would need to call multer.many(req.files, 'fieldName') instead of simply req.files[fieldName]. Busboy's LIMIT_FILE_COUNT already restricts total number of files so no need to check req.files.length. In the code, the req.files array isn't used for any operations other than pushing files onto it. Do you see any benefit anymore to having req.files as a single array? (hmmm... native array.filters is useful but would req.files get more use that way than as an object? )

In case other folks want to provide an opinion, below is output relevant to this discussion.

The new way puts all the files in a single array, regardless if the field name is the same.

[ { fieldname: 'fldFiles1',
    originalname: 'launcher.png',
    encoding: '7bit',
    mimetype: 'image/png',
    destination: '/tmp',
    filename: '937d0c1c6793f763fd81bf2b5a435855',
    path: '/tmp/937d0c1c6793f763fd81bf2b5a435855',
    size: 7506 },
  { fieldname: 'fldFiles2',
    originalname: 'launcher.png',
    encoding: '7bit',
    mimetype: 'image/png',
    destination: '/tmp',
    filename: '208f9ec346a26f2a67ddc73f19846a00',
    path: '/tmp/208f9ec346a26f2a67ddc73f19846a00',
    size: 7506 },
  { fieldname: 'fldFiles2',
    originalname: 'login_logo.png',
    encoding: '7bit',
    mimetype: 'image/png',
    destination: '/tmp',
    filename: '17f078b88b3094e89f3b38eb6410cbce',
    path: '/tmp/17f078b88b3094e89f3b38eb6410cbce',
    size: 36822 } ]

In order to retrieve fields by name, then use multer.many

multer.many(req.files, 'fldFiles1');
multer.many(req.files, 'fldFiles2');

Here is how current req.files appears with both single and multiple fields, when putSingleFilesInArray: true.

{ fldFiles1: 
   [ { fieldname: 'fldFiles1',
     originalname: 'launcher.png',
     name: 'fd9cca5463d018e94c3b1b74d4b85d29.png',
     encoding: '7bit',
     mimetype: 'image/png',
     path: '/tmp/fd9cca5463d018e94c3b1b74d4b85d29.png',
     extension: 'png',
     size: 7506,
     truncated: false,
     buffer: null } ],
  fldFiles2: 
   [ { fieldname: 'fldFiles2',
       originalname: 'launcher.png',
       name: 'df2782730e9030730d72d38e9b69cf07.png',
       encoding: '7bit',
       mimetype: 'image/png',
       path: '/tmp/df2782730e9030730d72d38e9b69cf07.png',
       extension: 'png',
       size: 7506,
       truncated: false,
       buffer: null },
     { fieldname: 'fldFiles2',
       originalname: 'login_logo.png',
       name: '3f63546f6b63dc6d3db916083ce5ef62.png',
       encoding: '7bit',
       mimetype: 'image/png',
       path: '/tmp/3f63546f6b63dc6d3db916083ce5ef62.png',
       extension: 'png',
       size: 36822,
       truncated: false,
       buffer: null } ] }

In order to retrieve fields by name, just lookup by fieldName.

req.files('fldFiles1')
req.files('fldFiles2')
jpfluger commented 9 years ago

I do like where you are going with filtering functions. Of course, if req.arrays were not a straight function, couldn't just use the native array filter function. Maybe multer.filter(req.files, filterFunction) or multer.filesToArray(req.files)? Using object map?

jpfluger commented 9 years ago

I am wrong, fileFilter does not handle #59. When fileFilter is called, req.files contains no data.

How about adding the res parameter to fileFilter? (e.g. fileFilter( req, res, file, cb) Then the calling function can use res.locals to track counts? Here's a sample usage that I tested to handle #59. Multi-files were submitted but only 1 file was saved to disk.

// the number represents the file limit
var allowedFields = {"fldName1" : 1, "fldName2": 1};

var fileFilter = function(req, res, file, cb) {

    var fileCounts = res.locals.fileCounts || {};
    fileCounts[file.fieldname] = (fileCounts[file.fieldname] ? fileCounts[file.fieldname] + 1 : 1);

    if (allowedFields && allowedFields[file.fieldname]) {
        if (fileCounts[file.fieldname] <= allowedFields[file.fieldname]) {
            res.locals.fileCounts = fileCounts;
            return cb(null, true); // true == ALLOW
        }
    }
    res.locals.fileCounts = fileCounts;
    return cb(null, false)
};
jpfluger commented 9 years ago

Also abort works when fileFilter callsback an error. Cleanup in done works. Then middleware tosses me to my 500 handler. Nice.

jpfluger commented 9 years ago

How about an option called options.filesReturnMode?

hacksparrow commented 9 years ago

@LinusU @jpfluger req.files[FIELD_NAME] feels more intuitive. If you are uploading a file with a fieldname, you'd want to be able to refer to it using the fieldname.

Let me spend some time on this issue.

jpfluger commented 9 years ago

@hacksparrow fyi, this conversation picked up again at the end of thread #131.

hacksparrow commented 9 years ago

Thanks @jpfluger

hacksparrow commented 9 years ago

Guys @LinusU @jpfluger, to maintain a developer-friendly API, let's continue using the current structure for containing the uploaded files. For internal operations, we could maintain an array to help iterate over the files, if required.

As library developers, we should strive to keep the API as intuitive and friendly as possible, and do the necessary heavy lifting underneath to make that possible.

Can you point out what the biggest challenges are in using an object instead of an array for storing the uploaded file objects?

LinusU commented 9 years ago

@hacksparrow My point when posting this is that it would be easier for the consumer, not internally. For us the only different is maybe 5 lines of code.

As a consumer of multer I personally feels like it would be easier to get all files as an array. Especially considering that I don't want people to be able to upload any file to my site, which I think we should assume that no one wants.

The options as I see is then to 1) specify a fileFilter which examines file.fieldname or 2) iterate over Object.keys(req.files) to make sure that I process all of the files.

The more I think about it the more I'm liking the proposal I wrote in #131. Since the use cases for this module is quite different depending on how your forms looks like, I think it would make sense to use it like this. It would also promote a more tight control on what files gets uploaded (aka. more security).

Another thing from #59 is that I don't think that we should ever encourage adding multer as a middleware to the entire application. It should be added in front of the route that will handle the file/files.

Maybe we could take these ideas and put them together to something like this? Or maybe this would add unnecessary complex for the user. The upside would be no more global file upload handlers, I guess that this would pretty much solve #59?

var app = express()
var upload = multer({ ... })

app.post('/profile', upload.single('avatar'), function (req, res, next) {
  // req.file is the `avatar` file
})

app.post('/photos/upload', upload.array('photos'), function (req, res, next) {
  // req.files is array of `photos` files
})

app.post('/upload', upload.any(), function (req, res, next) {
  // req.files is an object (String -> Array) where fieldname is the key, and the value is array of files
})
LinusU commented 9 years ago

From the security perspective this is what users currently do that I think that we want to discourage.

1) Adding multer as a global handler

app.use(multer({ ... }))

This is bad because the application will accept files to any endpoint even though the user only handles it in a few of them.

app.use(multer({ ... }))

app.post('/upload-picture', function (req, res, next) {
  req.files['pic'].forEach(function (file) {
    // Move the file to the users library
  })
})

// A malicious user could now upload any files and stick them into the
// temporary directory by making a POST request to any invalid url.
//
// e.g. POST a huge file to /blah

2) Not handling every uploaded file

If the user adds multer only to one route, but doesn't specify a file name filter, it's again possible to upload files that won't be handled.

var upload = multer({ ... })

app.post('/upload-picture', upload, function (req, res, next) {
  req.files['pic'].forEach(function (file) {
    // Move the file to the users library
  })
})

// A malicious user could now upload any files and stick them into the
// temporary directory by making a POST request with an unexpected fieldname.
//
// e.g. POST a huge file with the fieldname blah to /upload-picture

I feel that it's our responsibility to be proactive here and help the users not to fall into these pitfalls.

gabeio commented 9 years ago

@LinusU The first security flaw there is quiet interesting, I had not known that this was an issue. Is there anyway to check with express or other libraries to see if the route exists before allowing the upload to advance? (Thereby at minimum mitigating allowing uploads to invalid urls.)

LinusU commented 9 years ago

@gabeio That wouldn't really solve anything since the user would then need to check for files in every route anyhow. I guess we could narrow that anymore by restricting it to only POST but then we would create other problems and also not guarantee anything. I feel that this needs a fresh approach.

gabeio commented 9 years ago

It couldn't just be post either patch & put can contain files ie: edited files

LinusU commented 9 years ago

Yes, I thinking of that as other problem, my point really was that it's not proactive at all. I think that the only solution to this problem is allowing the user to express with greater clarity what the intent is. One way to do that is as I proposed.

jpfluger commented 9 years ago

Regarding @hacksparrow's suggestion to keep the API as intuitive and friendly as possible.

What's the alternative? What's less-breaking? Maybe (1) pass-in an option for allowedFields to a new multer instance and (2) then req.files defaults to an object pointing to an array (String->Array). Also, wouldn't allowedFields need to be an object with properties for name and count?

var upload = multer({  allowedFields:{ name: 'avatar', maxCount: 1 });
var upload2 = multer({  allowedFields:{ name: 'avatar', maxCount: 10 } });
var upload3 = multer({  allowedFields: [...as-array...] });

// in route handler
req.files['avatar'] -> array
req.files['photos'] -> array

The underlying implementation would still be able to restrict by field and get the benefits of security that @LinusU proposed via functions: upload.single('avatar') and upload.array('photos'), which never specified a count restriction on the field. Counts by fieldname are important (See my post re fileFilter in #131)

Optionally, for users wanting req.files as an array, then pass in an option for returnType to define if req.files should be returned as an array or object.

What is missing in this approach? Users will not be forced to seek out the latest docs to read how to implement the new multer. And the suggestions in the new docs are security-focused, which if followed, take care of some gaping security holes as mentioned more fully in #59, some in this thread and additional suggestions at the end of #131.

hacksparrow commented 9 years ago

Valid points @LinusU. I like the interface suggested by @jpfluger.

Is there a situation where one would want the files to be returned as an array? If not, we should try to keep the options to a minimum and avoid introducing returnType. By "returned as an array", I mean all the files (with different fields and multiple uploads for the same fields) are returned as a big one-dimensional array.

LinusU commented 9 years ago

@jpfluger Interesting approach! The part I believe is missing hides in this part.

Users will not be forced to seek out the latest docs to read how to implement the new multer. And the suggestions in the new docs are security-focused, which if followed, take care of some gaping security holes as mentioned more fully in #59

Especially the "which if followed". I believe that it's better to bite the bullet now and require people that upgrades to have a quick look at the docs. Most users will always choose the easiest path to get up and running quickly, and legacy code sticks. I think it's important that we are as "safe by default" as we can be.

Users that don't take care to instantiate one multer per route will be stuck with "gaping security holes".

Your point about count per field is very valid, and I think that it could be incorporated into an api where we provide security by default though. Actually, we can't really provide proper security without this addition :)

This could be a way to incorporate that:

var multer = require('multer')
var upload = multer({ dest: '/tmp' })

// equivalent to `upload.fields([{ name: 'avatar', maxCount: 1 }])`
app.post('/profile', upload.single('avatar'), function (req, res, next) {
  // req.files['avatar'][0] is the `avatar` file
})

// equivalent to `upload.fields([{ name: 'photos', maxCount: 12 }])`
app.post('/photos/upload', upload.array('photos', 12), function (req, res, next) {
  // req.files['photos'] is array of `photos` files
})

var cpUpload = upload.fields([{ name: 'avatar', maxCount: 1 }, { name: 'gallery', maxCount: 8 }])
app.post('/cool-profile', cpUpload, function (req, res, next) {
  // req.files is an object (String -> Array) where fieldname is the key, and the value is array of files
})

A bit unrelated but I do believe that we could make the single case a bit easier on the user. Compare these two examples:

if (req.files['avatar'].length > 1) {
  updateAvatar(req.files['avatar'][0])
}
if (req.file) {
  updateAvatar(req.file)
}
LinusU commented 9 years ago

upload.fields() could also be used to specify maxSizes on files:

upload.fileds([
  { name: 'avatar', maxCount: 1, maxSize: '4MB' },
  { name: 'gallery', maxCount: 8, maxSize: '32MB' }
])
hacksparrow commented 9 years ago

maxSize is good addition, while we are at it.

jpfluger commented 9 years ago

All good ideas and this is close... but there is still a divide, yes? And isn't it a question of forcing more upgrade pain now rather than possibly more pain later? On the one side, in order for users to upgrade multer with ease, make allowedFields an option. On the other side, use functions like upload.fileds or upload.single (where upload is an instance of multer). The 2nd way forces all users to break implementations. The 1st way might not. Is that what we're really talking about here?

LinusU commented 9 years ago

@jpfluger Why I'm pushing for this is to push our users into securing up there apps. Most likely everyone needs to go thru and make some small tweaks for it to work with this new version. We have removed almost every option that was available before and replaced it with a much simpler api.

So since we will have breaking changes either way, I think it's a great opportunity to make sure that we are "secure by default". Not leaving our users vulnerable by just using the module with the default settings.

The reason why I don't want to have allowedFields optional is that I think that many users will miss it. People writing tutorials or people answering questions on StackOverflow will want to show code that works with as few lines as possible, and it's often implied that it might not have been reviewed from a security perspective.

I think that there are two questions that we should try and answer.

1) Will the user be required to make changes when upgrading.

As it is now this is a yes since the options object have changed very much. If we wan't backwards compatibility we have a bit more to go, but the point that I had when making this branch was to introduce breaking changes.

2) Do we want the library to be secured by default and let the user open up what functionality it need, or should the library be open by default and it's the users responsibility to tighten the security?

Given that javascript is a very accessible programming language there will probably be many users of this library that are quite new to this. I think that we should take it upon ourselves to make multer as secure as possible out of the box.

Using the functions that I provide (and I'm open to any other suggestions) we will provide a solution where it is as easy to be secure as it is to allow all. I think that this is they key to helping everyone to achieve good usage.

The problem with optional security is that it's easy to forget or to think that "we will fix that before we release".

gabeio commented 9 years ago

@LinusU Why not have a way to put something into allowedFields have a allowedFields.("all") but it will always spit out a big warning when using it. But this would be super useful when developing as when you are just starting an app you really have no idea what fields are going to be populated right out of the gate...

gabeio commented 9 years ago

Side question: is it still "bad practice" to load multer globally (app.use(multer)) but allow no files on the global one then on a per route basis allow files?

LinusU commented 9 years ago

@gabeio With the syntax I proposed you would be able to do upload.any() to accept any and all files. I'm not so sure that it should even output a warning since there might be legit use cases for it. As long as the user is aware of it, all files can be checked.

Side question: What benefit would app.use(multer) do if it didn't accept any files? It would actually catch the stream before any other multer and then deny it because it dosen't accept any files.

jpfluger commented 9 years ago

@hacksparrow Even if allowedFields were adopted, too many changes to the original options have happened in the breaking-changes branch that would cause most, if not all, multer implementations to break. One of these was the announcement in the 0.1.8 README that we were going to replace req.files with an object that points to an array completely (putSingleFilesInArray). If anyone does anything special with error-handling functions, then it would break as well.

By letting multer functions multer.any(...allowedFields...), multer.single(...allowedFields...), multer.array(...allowedFields...) return the express middleware function instead of multer(...), we get more flexibility on the req.files return type and reusability on the core instance without having to muddle options with returnType or allowedFields, which are function specific. Also it will break every multer implementation completely but unless someone else steps forward with concrete suggestions on why it shouldn't happen or alternatives to code, I find @LinusU's arguments compelling.

Anyone else have alternatives? Or can we move forward on this?

hacksparrow commented 9 years ago

@jpfluger @LinusU let's move ahead with it.

Also Linus, the proposal for req.file as a reference to req.files['avatar'][0] in case of single() is a neat suggestion.

hacksparrow commented 9 years ago

@LinusU @jpfluger can you go ahead with the implementation?

LinusU commented 9 years ago

Yes, I'm on it, just been a bit busy lately. Hopefully I can get this done within the week :)

hacksparrow commented 9 years ago

Great!

LinusU commented 9 years ago

@hacksparrow @jpfluger Just pushed the implementation :)

Still working on writing up the documentation thought, expect that to come soon.

Usage

var express = require('express')
var multer  = require('multer')
var upload = multer({ dest: 'uploads/' })

var app = express()

app.post('/profile', upload.single('avatar'), function (req, res, next) {
  // req.file is the `avatar` file
})

app.post('/photos/upload', upload.array('photos', 12), function (req, res, next) {
  // req.files is array of `photos` files
})

var cpUpload = upload.fields([{ name: 'avatar', maxCount: 1 }, { name: 'gallery', maxCount: 8 }])
app.post('/cool-profile', cpUpload, function (req, res, next) {
  // req.files is an object (String -> Array) where fieldname is the key, and the value is array of files
  //
  // e.g.
  //  req.files['avatar'][0] -> File
  //  req.files['gallery'] -> Array
})

You can access post data fields as body on the request object:

console.log(req.body)

IMPORTANT: Multer will only process forms which is of the type multipart/form-data.

hacksparrow commented 9 years ago

@LinusU that's fantastic!

@jpfluger it would be great if you can work with @LinusU on #131 and close it as well. The world deserves a more secure and efficient multer :grinning:.

jpfluger commented 9 years ago

@LinusU @hacksparrow Builds just fine. I'll test some more tonight, integrating in one of my projects... which bring me back to #131-- Trying to think through how to handle errors more smartly? What about having an option called options.noNextErr, which adds the err to req (eg req.errMulter = err), then calls next() instead of next(err)? This would be optional. I prefer the results of multer (even if aborted) to proceed to my next step of middleware in which fields are validated and then either page rendered/ajax returned. Have sense?

jpfluger commented 9 years ago

Nice abstractions in the code @LinusU. Well done. I integrated into one of my sites but I can't use it until I get error-handling working right (see comment above). Thoughts? Want me to add to breaking-changes? Or is there a better way? I should think my situation pretty common-- Here's an example: I have an input field of type "file" and named "FIELD1" and a maxCount of 2. If three files are submitted, then multer detects FIELD1's count has been exceeded and multer aborts, then calling next(err). Of course next(err) will drop to the middleware that handles errors, in my case returning status 500. But in my app, I'd rather assign the err to req.errMulter and call next(). The next piece of middleware validates all fields and then handles the page or ajax response.

LinusU commented 9 years ago

Yes we absolutely need to support this.

Oh, I just realised a bigger issue here; if I have a limit of 10 files, an attacker could upload 10 huge files and 1 small file. The small file would trigger an error and I wouldn't get a chance to remove the 10 first files.

I think that we need to introduce two modes.

Automatic error handling When an error is reached, all the uploaded files are deleted and the error is than passed on to express to be handled by the standard error handling.

Manual error handling When an error is reached, we will immediately pass everything on to the next handler with the uploadError (or similar) property set, on the request object, to the error that occurred. The user is now responsible for cleaning up any files that might have been uploaded prior to the error.

To implement this we need to:

I'll start working on this so that we can see how it feels, I would appreciate help with the naming :+1:

jpfluger commented 9 years ago

Good options. Thinking this through.... How about the option to auto-delete files within multer and then continue next() with uploadError? But maybe not. I guess it will depend on how this is coded. I have a large project, where the validation middleware is in a different file than where I create multer. In the validation middleware, I would need to var storageEngine = require('multer's storage engine') to get the cached instance and then use the remove functionality from there? (eg storageEngine.removeFiles(req.files)) That's about right? Oh wait, I'd need the instance of the storage engine b/c of connectivity.... (Again, thinking this through) Re. names. Probably use uploadError over multerError. Maybe removeFiles instead of remove.

LinusU commented 9 years ago

Hmm, I actually never intended the removeFile (much better than remove) function to be used by the end user, but rather just be for internal usage. So that the api from the storage engine that multer can use is handleFile(), removeFile().

Does it make sense to expose any functions from the storage engine publicly to the user? I'm thinking that it actually creates more hassle than it's worth. Cause then we probably should support other file operations like moving, reading, etc.

Actually I think that it should be internal only, and we should probably rename them to _handleFile and _removeFile in the same spirit as Node's API for Stream Implementors works.

jpfluger commented 9 years ago

I concur, internal only. Exposing functions from the storage engine publicly will only confuse things and it will be easy to delete files later down the middleware chain. Re. renaming to _handleFile and _removeFile: if you do it, do it now and not after release. :)

LinusU commented 9 years ago

Absolutely, I'll do it right away :)

LinusU commented 9 years ago

I have now renamed it to _handleFile and added _removeFile.

I have also built the functionality to remove uploaded files when aborting due to errors. The code is in #131.

Manual error handling is not done yet thought.

LinusU commented 9 years ago

Okay, manual error handling is now implemented :)

@jpfluger Would you mind testing this out? I have added documentation and tests for it :+1:

jpfluger commented 9 years ago

Hey @LinusU, I'll test tonight within my bigger project.... I just uploaded to breaking-changes small tweaks to the current documentation. Been familiarizing myself with code. Great work on all the mocha tests.

jpfluger commented 9 years ago

Can this thread be retired already? ;-)

LinusU commented 9 years ago

This is now fixed in multer 1.0.0

Shawful commented 6 years ago

I am having the biggest problem using angular 1.6 to plug into multer in a way so I can upload multiple files and the documentation is seriously lacking for me to follow unfortunately. Are there plans to add documentation on multiple files?