Closed LinusU closed 9 years ago
I like the function ideas.
Re req.files as array--- Maybe have both? Maybe keep the array but then add a pointer to it within the actual json req.body. eg. if appendField(req.body, fieldname, FILE-INDEX-POINTER)
were called within the busboy.on('file'....
portion of code. Doesn't this then get fieldnames with file types in the right position according to the w3c specs? (I was just looking at Example 9 - not certain of the details).
The reason I suggest this is that it's always been a little cumbersome to consume ORM-generated forms/data, when the form's fields have been moved from its normal positioning (eg within req.body) to an outside structure (eg req.files)
Hmm, that would be quite cool actually!
I kind of like the name
, type
, body
as well, although that only works with MemoryStorage. Not super relevant to this topic but I think we should rename buffer
to body
in the MemoryStorage.
Although the thing I don't like is almost following standards. And it might open up to possible attack vector, imagine the following data:
form.append('profilepicture', file('test.jpg'))
form.append('profilepicture[path]', '/etc/passwd')
What do you think of the following?
req.files
be an arraymulter.one(req.files, 'fieldname')
which returns a file or null
multer.many(req.files, 'fieldname')
which returns an array, that might be emptyLet's try it.
Pushed as 6ef893b :)
Ok, I've tried req.files and don't like it as an array. It just seems more intuitive and less breaking to do req.files[fieldName], where req.files[fieldName] = array of field objects associated to that fieldName.
The fileFilter
function handles issue #59, but in addition to it I would need to call multer.many(req.files, 'fieldName')
instead of simply req.files[fieldName].
Busboy's LIMIT_FILE_COUNT
already restricts total number of files so no need to check req.files.length.
In the code, the req.files array isn't used for any operations other than pushing files onto it.
Do you see any benefit anymore to having req.files as a single array? (hmmm... native array.filters is useful but would req.files get more use that way than as an object? )
In case other folks want to provide an opinion, below is output relevant to this discussion.
The new way puts all the files in a single array, regardless if the field name is the same.
[ { fieldname: 'fldFiles1',
originalname: 'launcher.png',
encoding: '7bit',
mimetype: 'image/png',
destination: '/tmp',
filename: '937d0c1c6793f763fd81bf2b5a435855',
path: '/tmp/937d0c1c6793f763fd81bf2b5a435855',
size: 7506 },
{ fieldname: 'fldFiles2',
originalname: 'launcher.png',
encoding: '7bit',
mimetype: 'image/png',
destination: '/tmp',
filename: '208f9ec346a26f2a67ddc73f19846a00',
path: '/tmp/208f9ec346a26f2a67ddc73f19846a00',
size: 7506 },
{ fieldname: 'fldFiles2',
originalname: 'login_logo.png',
encoding: '7bit',
mimetype: 'image/png',
destination: '/tmp',
filename: '17f078b88b3094e89f3b38eb6410cbce',
path: '/tmp/17f078b88b3094e89f3b38eb6410cbce',
size: 36822 } ]
In order to retrieve fields by name, then use multer.many
multer.many(req.files, 'fldFiles1');
multer.many(req.files, 'fldFiles2');
Here is how current req.files
appears with both single and multiple fields, when putSingleFilesInArray: true
.
{ fldFiles1:
[ { fieldname: 'fldFiles1',
originalname: 'launcher.png',
name: 'fd9cca5463d018e94c3b1b74d4b85d29.png',
encoding: '7bit',
mimetype: 'image/png',
path: '/tmp/fd9cca5463d018e94c3b1b74d4b85d29.png',
extension: 'png',
size: 7506,
truncated: false,
buffer: null } ],
fldFiles2:
[ { fieldname: 'fldFiles2',
originalname: 'launcher.png',
name: 'df2782730e9030730d72d38e9b69cf07.png',
encoding: '7bit',
mimetype: 'image/png',
path: '/tmp/df2782730e9030730d72d38e9b69cf07.png',
extension: 'png',
size: 7506,
truncated: false,
buffer: null },
{ fieldname: 'fldFiles2',
originalname: 'login_logo.png',
name: '3f63546f6b63dc6d3db916083ce5ef62.png',
encoding: '7bit',
mimetype: 'image/png',
path: '/tmp/3f63546f6b63dc6d3db916083ce5ef62.png',
extension: 'png',
size: 36822,
truncated: false,
buffer: null } ] }
In order to retrieve fields by name, just lookup by fieldName.
req.files('fldFiles1')
req.files('fldFiles2')
I do like where you are going with filtering functions. Of course, if req.arrays were not a straight function, couldn't just use the native array filter function. Maybe multer.filter(req.files, filterFunction)
or multer.filesToArray(req.files)? Using object map?
I am wrong, fileFilter
does not handle #59. When fileFilter
is called, req.files
contains no data.
How about adding the res
parameter to fileFilter? (e.g. fileFilter( req, res, file, cb)
Then the calling function can use res.locals
to track counts? Here's a sample usage that I tested to handle #59. Multi-files were submitted but only 1 file was saved to disk.
// the number represents the file limit
var allowedFields = {"fldName1" : 1, "fldName2": 1};
var fileFilter = function(req, res, file, cb) {
var fileCounts = res.locals.fileCounts || {};
fileCounts[file.fieldname] = (fileCounts[file.fieldname] ? fileCounts[file.fieldname] + 1 : 1);
if (allowedFields && allowedFields[file.fieldname]) {
if (fileCounts[file.fieldname] <= allowedFields[file.fieldname]) {
res.locals.fileCounts = fileCounts;
return cb(null, true); // true == ALLOW
}
}
res.locals.fileCounts = fileCounts;
return cb(null, false)
};
Also abort
works when fileFilter
callsback an error. Cleanup in done
works. Then middleware tosses me to my 500 handler. Nice.
How about an option called options.filesReturnMode
?
options.filesReturnMode === 0
, then return req.files as an array.options.filesReturnMode => 1
, then req.files is an object whose keys points to an array of file objects. This is very similar to how multer/master currently functions. options.filesReturnMode === 2
, then in done
and not an error, req.files
shallow copies into req.body
. Typically when an ORM object consumes field values, I pass in req.body
and the ORM validates field values based on data type. The reason the shallow copy happens in done
and not directly to req.body
during field parsing is to prevent the attack vector @LinusU mentioned above. Of course, I can do the shallow copy outside of multer but it would be nice to tuck it inside. @LinusU @jpfluger req.files[FIELD_NAME]
feels more intuitive. If you are uploading a file with a fieldname, you'd want to be able to refer to it using the fieldname.
Let me spend some time on this issue.
@hacksparrow fyi, this conversation picked up again at the end of thread #131.
Thanks @jpfluger
Guys @LinusU @jpfluger, to maintain a developer-friendly API, let's continue using the current structure for containing the uploaded files. For internal operations, we could maintain an array to help iterate over the files, if required.
As library developers, we should strive to keep the API as intuitive and friendly as possible, and do the necessary heavy lifting underneath to make that possible.
Can you point out what the biggest challenges are in using an object instead of an array for storing the uploaded file objects?
@hacksparrow My point when posting this is that it would be easier for the consumer, not internally. For us the only different is maybe 5 lines of code.
As a consumer of multer I personally feels like it would be easier to get all files as an array. Especially considering that I don't want people to be able to upload any file to my site, which I think we should assume that no one wants.
The options as I see is then to 1) specify a fileFilter
which examines file.fieldname
or 2) iterate over Object.keys(req.files)
to make sure that I process all of the files.
The more I think about it the more I'm liking the proposal I wrote in #131. Since the use cases for this module is quite different depending on how your forms looks like, I think it would make sense to use it like this. It would also promote a more tight control on what files gets uploaded (aka. more security).
Another thing from #59 is that I don't think that we should ever encourage adding multer as a middleware to the entire application. It should be added in front of the route that will handle the file/files.
Maybe we could take these ideas and put them together to something like this? Or maybe this would add unnecessary complex for the user. The upside would be no more global file upload handlers, I guess that this would pretty much solve #59?
var app = express()
var upload = multer({ ... })
app.post('/profile', upload.single('avatar'), function (req, res, next) {
// req.file is the `avatar` file
})
app.post('/photos/upload', upload.array('photos'), function (req, res, next) {
// req.files is array of `photos` files
})
app.post('/upload', upload.any(), function (req, res, next) {
// req.files is an object (String -> Array) where fieldname is the key, and the value is array of files
})
From the security perspective this is what users currently do that I think that we want to discourage.
1) Adding multer
as a global handler
app.use(multer({ ... }))
This is bad because the application will accept files to any endpoint even though the user only handles it in a few of them.
app.use(multer({ ... }))
app.post('/upload-picture', function (req, res, next) {
req.files['pic'].forEach(function (file) {
// Move the file to the users library
})
})
// A malicious user could now upload any files and stick them into the
// temporary directory by making a POST request to any invalid url.
//
// e.g. POST a huge file to /blah
2) Not handling every uploaded file
If the user adds multer only to one route, but doesn't specify a file name filter, it's again possible to upload files that won't be handled.
var upload = multer({ ... })
app.post('/upload-picture', upload, function (req, res, next) {
req.files['pic'].forEach(function (file) {
// Move the file to the users library
})
})
// A malicious user could now upload any files and stick them into the
// temporary directory by making a POST request with an unexpected fieldname.
//
// e.g. POST a huge file with the fieldname blah to /upload-picture
I feel that it's our responsibility to be proactive here and help the users not to fall into these pitfalls.
@LinusU The first security flaw there is quiet interesting, I had not known that this was an issue. Is there anyway to check with express or other libraries to see if the route exists before allowing the upload to advance? (Thereby at minimum mitigating allowing uploads to invalid urls.)
@gabeio That wouldn't really solve anything since the user would then need to check for files in every route anyhow. I guess we could narrow that anymore by restricting it to only POST
but then we would create other problems and also not guarantee anything. I feel that this needs a fresh approach.
It couldn't just be post
either patch
& put
can contain files ie: edited files
Yes, I thinking of that as other problem, my point really was that it's not proactive at all. I think that the only solution to this problem is allowing the user to express with greater clarity what the intent is. One way to do that is as I proposed.
Regarding @hacksparrow's suggestion to keep the API as intuitive and friendly as possible.
What's the alternative? What's less-breaking
? Maybe (1) pass-in an option for allowedFields
to a new multer instance and (2) then req.files
defaults to an object pointing to an array (String->Array). Also, wouldn't allowedFields
need to be an object with properties for name and count?
var upload = multer({ allowedFields:{ name: 'avatar', maxCount: 1 });
var upload2 = multer({ allowedFields:{ name: 'avatar', maxCount: 10 } });
var upload3 = multer({ allowedFields: [...as-array...] });
// in route handler
req.files['avatar'] -> array
req.files['photos'] -> array
The underlying implementation would still be able to restrict by field and get the benefits of security that @LinusU proposed via functions: upload.single('avatar')
and upload.array('photos')
, which never specified a count restriction on the field. Counts by fieldname are important (See my post re fileFilter
in #131)
Optionally, for users wanting req.files as an array, then pass in an option for returnType
to define if req.files
should be returned as an array or object.
What is missing in this approach? Users will not be forced to seek out the latest docs to read how to implement the new multer. And the suggestions in the new docs are security-focused, which if followed, take care of some gaping security holes as mentioned more fully in #59, some in this thread and additional suggestions at the end of #131.
Valid points @LinusU. I like the interface suggested by @jpfluger.
Is there a situation where one would want the files to be returned as an array? If not, we should try to keep the options to a minimum and avoid introducing returnType
. By "returned as an array", I mean all the files (with different fields and multiple uploads for the same fields) are returned as a big one-dimensional array.
@jpfluger Interesting approach! The part I believe is missing hides in this part.
Users will not be forced to seek out the latest docs to read how to implement the new multer. And the suggestions in the new docs are security-focused, which if followed, take care of some gaping security holes as mentioned more fully in #59
Especially the "which if followed". I believe that it's better to bite the bullet now and require people that upgrades to have a quick look at the docs. Most users will always choose the easiest path to get up and running quickly, and legacy code sticks. I think it's important that we are as "safe by default" as we can be.
Users that don't take care to instantiate one multer per route will be stuck with "gaping security holes".
Your point about count per field is very valid, and I think that it could be incorporated into an api where we provide security by default though. Actually, we can't really provide proper security without this addition :)
This could be a way to incorporate that:
var multer = require('multer')
var upload = multer({ dest: '/tmp' })
// equivalent to `upload.fields([{ name: 'avatar', maxCount: 1 }])`
app.post('/profile', upload.single('avatar'), function (req, res, next) {
// req.files['avatar'][0] is the `avatar` file
})
// equivalent to `upload.fields([{ name: 'photos', maxCount: 12 }])`
app.post('/photos/upload', upload.array('photos', 12), function (req, res, next) {
// req.files['photos'] is array of `photos` files
})
var cpUpload = upload.fields([{ name: 'avatar', maxCount: 1 }, { name: 'gallery', maxCount: 8 }])
app.post('/cool-profile', cpUpload, function (req, res, next) {
// req.files is an object (String -> Array) where fieldname is the key, and the value is array of files
})
A bit unrelated but I do believe that we could make the single
case a bit easier on the user. Compare these two examples:
if (req.files['avatar'].length > 1) {
updateAvatar(req.files['avatar'][0])
}
if (req.file) {
updateAvatar(req.file)
}
upload.fields()
could also be used to specify maxSizes on files:
upload.fileds([
{ name: 'avatar', maxCount: 1, maxSize: '4MB' },
{ name: 'gallery', maxCount: 8, maxSize: '32MB' }
])
maxSize
is good addition, while we are at it.
All good ideas and this is close... but there is still a divide, yes? And isn't it a question of forcing more upgrade pain now rather than possibly more pain later? On the one side, in order for users to upgrade multer with ease, make allowedFields
an option. On the other side, use functions like upload.fileds
or upload.single
(where upload is an instance of multer). The 2nd way forces all users to break implementations. The 1st way might not. Is that what we're really talking about here?
@jpfluger Why I'm pushing for this is to push our users into securing up there apps. Most likely everyone needs to go thru and make some small tweaks for it to work with this new version. We have removed almost every option that was available before and replaced it with a much simpler api.
So since we will have breaking changes either way, I think it's a great opportunity to make sure that we are "secure by default". Not leaving our users vulnerable by just using the module with the default settings.
The reason why I don't want to have allowedFields
optional is that I think that many users will miss it. People writing tutorials or people answering questions on StackOverflow will want to show code that works with as few lines as possible, and it's often implied that it might not have been reviewed from a security perspective.
I think that there are two questions that we should try and answer.
1) Will the user be required to make changes when upgrading.
As it is now this is a yes since the options object have changed very much. If we wan't backwards compatibility we have a bit more to go, but the point that I had when making this branch was to introduce breaking changes.
2) Do we want the library to be secured by default and let the user open up what functionality it need, or should the library be open by default and it's the users responsibility to tighten the security?
Given that javascript is a very accessible programming language there will probably be many users of this library that are quite new to this. I think that we should take it upon ourselves to make multer as secure as possible out of the box.
Using the functions that I provide (and I'm open to any other suggestions) we will provide a solution where it is as easy to be secure as it is to allow all. I think that this is they key to helping everyone to achieve good usage.
The problem with optional security is that it's easy to forget or to think that "we will fix that before we release".
@LinusU Why not have a way to put something into allowedFields have a allowedFields.("all")
but it will always spit out a big warning when using it. But this would be super useful when developing as when you are just starting an app you really have no idea what fields are going to be populated right out of the gate...
Side question: is it still "bad practice" to load multer globally (app.use(multer)
) but allow no files on the global one then on a per route basis allow files?
@gabeio With the syntax I proposed you would be able to do upload.any()
to accept any and all files. I'm not so sure that it should even output a warning since there might be legit use cases for it. As long as the user is aware of it, all files can be checked.
Side question: What benefit would app.use(multer)
do if it didn't accept any files? It would actually catch the stream before any other multer and then deny it because it dosen't accept any files.
@hacksparrow Even if allowedFields
were adopted, too many changes to the original options
have happened in the breaking-changes
branch that would cause most, if not all, multer implementations to break. One of these was the announcement in the 0.1.8 README that we were going to replace req.files
with an object that points to an array completely (putSingleFilesInArray
). If anyone does anything special with error-handling functions, then it would break as well.
By letting multer functions multer.any(...allowedFields...)
, multer.single(...allowedFields...)
, multer.array(...allowedFields...)
return the express middleware function instead of multer(...)
, we get more flexibility on the req.files
return type and reusability on the core instance without having to muddle options
with returnType
or allowedFields
, which are function specific. Also it will break every multer implementation completely but unless someone else steps forward with concrete suggestions on why it shouldn't happen or alternatives to code, I find @LinusU's arguments compelling.
Anyone else have alternatives? Or can we move forward on this?
@jpfluger @LinusU let's move ahead with it.
Also Linus, the proposal for req.file
as a reference to req.files['avatar'][0]
in case of single()
is a neat suggestion.
@LinusU @jpfluger can you go ahead with the implementation?
Yes, I'm on it, just been a bit busy lately. Hopefully I can get this done within the week :)
Great!
@hacksparrow @jpfluger Just pushed the implementation :)
Still working on writing up the documentation thought, expect that to come soon.
var express = require('express')
var multer = require('multer')
var upload = multer({ dest: 'uploads/' })
var app = express()
app.post('/profile', upload.single('avatar'), function (req, res, next) {
// req.file is the `avatar` file
})
app.post('/photos/upload', upload.array('photos', 12), function (req, res, next) {
// req.files is array of `photos` files
})
var cpUpload = upload.fields([{ name: 'avatar', maxCount: 1 }, { name: 'gallery', maxCount: 8 }])
app.post('/cool-profile', cpUpload, function (req, res, next) {
// req.files is an object (String -> Array) where fieldname is the key, and the value is array of files
//
// e.g.
// req.files['avatar'][0] -> File
// req.files['gallery'] -> Array
})
You can access post data fields as body
on the request
object:
console.log(req.body)
IMPORTANT: Multer will only process forms which is of the type multipart/form-data
.
@LinusU that's fantastic!
@jpfluger it would be great if you can work with @LinusU on #131 and close it as well. The world deserves a more secure and efficient multer
:grinning:.
@LinusU @hacksparrow Builds just fine. I'll test some more tonight, integrating in one of my projects... which bring me back to #131-- Trying to think through how to handle errors more smartly? What about having an option called options.noNextErr
, which adds the err to req (eg req.errMulter = err
), then calls next()
instead of next(err)
? This would be optional. I prefer the results of multer (even if aborted) to proceed to my next step of middleware in which fields are validated and then either page rendered/ajax returned. Have sense?
Nice abstractions in the code @LinusU. Well done. I integrated into one of my sites but I can't use it until I get error-handling working right (see comment above). Thoughts? Want me to add to breaking-changes? Or is there a better way? I should think my situation pretty common-- Here's an example: I have an input field of type "file" and named "FIELD1" and a maxCount of 2. If three files are submitted, then multer detects FIELD1's count has been exceeded and multer aborts, then calling next(err)
. Of course next(err)
will drop to the middleware that handles errors, in my case returning status 500. But in my app, I'd rather assign the err to req.errMulter
and call next()
. The next piece of middleware validates all fields and then handles the page or ajax response.
Yes we absolutely need to support this.
Oh, I just realised a bigger issue here; if I have a limit of 10 files, an attacker could upload 10 huge files and 1 small file. The small file would trigger an error and I wouldn't get a chance to remove the 10 first files.
I think that we need to introduce two modes.
Automatic error handling When an error is reached, all the uploaded files are deleted and the error is than passed on to express to be handled by the standard error handling.
Manual error handling When an error is reached, we will immediately pass everything on to the next handler with the uploadError
(or similar) property set, on the request object, to the error that occurred. The user is now responsible for cleaning up any files that might have been uploaded prior to the error.
To implement this we need to:
remove
function to the storage engineserrorHandling: 'manual'
and 'automatic'
)multerError
or uploadError
)I'll start working on this so that we can see how it feels, I would appreciate help with the naming :+1:
Good options. Thinking this through.... How about the option to auto-delete files within multer and then continue next()
with uploadError
? But maybe not. I guess it will depend on how this is coded. I have a large project, where the validation middleware is in a different file than where I create multer. In the validation middleware, I would need to var storageEngine = require('multer's storage engine')
to get the cached instance and then use the remove
functionality from there? (eg storageEngine.removeFiles(req.files)
) That's about right? Oh wait, I'd need the instance of the storage engine b/c of connectivity.... (Again, thinking this through) Re. names. Probably use uploadError
over multerError
. Maybe removeFiles
instead of remove
.
Hmm, I actually never intended the removeFile
(much better than remove
) function to be used by the end user, but rather just be for internal usage. So that the api from the storage engine that multer can use is handleFile()
, removeFile()
.
Does it make sense to expose any functions from the storage engine publicly to the user? I'm thinking that it actually creates more hassle than it's worth. Cause then we probably should support other file operations like moving, reading, etc.
Actually I think that it should be internal only, and we should probably rename them to _handleFile
and _removeFile
in the same spirit as Node's API for Stream Implementors works.
I concur, internal only. Exposing functions from the storage engine publicly will only confuse things and it will be easy to delete files later down the middleware chain. Re. renaming to _handleFile
and _removeFile
: if you do it, do it now and not after release. :)
Absolutely, I'll do it right away :)
I have now renamed it to _handleFile
and added _removeFile
.
I have also built the functionality to remove uploaded files when aborting due to errors. The code is in #131.
Manual error handling is not done yet thought.
Okay, manual error handling is now implemented :)
@jpfluger Would you mind testing this out? I have added documentation and tests for it :+1:
Hey @LinusU, I'll test tonight within my bigger project.... I just uploaded to breaking-changes small tweaks to the current documentation. Been familiarizing myself with code. Great work on all the mocha tests.
Can this thread be retired already? ;-)
This is now fixed in multer 1.0.0
I am having the biggest problem using angular 1.6 to plug into multer in a way so I can upload multiple files and the documentation is seriously lacking for me to follow unfortunately. Are there plans to add documentation on multiple files?
I'm really unsure of this myself, but I wanted to get the discussion started. Should
req.files
be an array?It would certainly help #59 since there then would be only one array to check. I also think that the most common case is having one file upload which then would be really easy to find,
req.files[0]
.The downside would be pages that has multiple files with different fieldnames, it would be a bit more complicated to find a specific file.
To mitigate that we could expose a
byFieldname(fieldname, multiple)
function.fieldname
would obviously be the filename andmultiple
would be a boolean on wether to allow multiple files or not. It would default tofalse
.If
multiple
isfalse
and there is more than one file uploaded to that fieldname, I think that it should throw an error.On top of this we could also expose a
byMimetype(mimetype, multiple)
function. That could help with sites that allows uploading any files, and then want to treat music, pictures or movies differently.Although I feel that we are maybe providing a bit too much with the functions. If
req.files
is just a normal array, it's very easy to do this with both built in functions, and already popular libraries.examples