balderdashy / sails

Realtime MVC Framework for Node.js
https://sailsjs.com
MIT License
22.83k stars 1.95k forks source link

Skipper not behaving according to documentation #4748

Open thesaxonedone opened 5 years ago

thesaxonedone commented 5 years ago

Node version: 10.15.3 Sails version (sails): 1.1.0 ORM hook version (sails-hook-orm): Sockets hook version (sails-hook-sockets): Organics hook version (sails-hook-organics): Grunt hook version (sails-hook-grunt): Uploads hook version (sails-hook-uploads): DB adapter & version (e.g. sails-mysql@5.55.5): Skipper adapter & version (e.g. skipper-s3@5.55.5): 0.9.0-4


This is a skipper issue

I dont know if this is the appropriate place to put this , but given that skipper's Issues are not viewable, and given that it seems to be a balderdashy related library, I figured I'd post it here.

The following documentation in skipper seems to be inaccurate with regard to multipart forms (specifically, the bolded part):

When the request ends, or a significant (configurable) period of time has passed since a new file upload has shown up on "foo", the upstream is considered to be complete. If you're using req.file('foo').upload(function(err,uploadedFiles){ ... }), the callback will be triggered with uploadedFiles as an array of metadata objects describing the files that were uploaded (including the at-rest file descriptor.) If an error occured, the callback will be triggered with err instead. (Note that, if you're using the raw upstream directly, you can listen for the "finish" and "error" events to achieve the same effect.)

In general, when possible, you should put req.file() outside of asynchronous callbacks in your code. However this isn't always possible-- Skipper tolerates this case "holding on" to unrecognized file streams (e.g. "bar") for a configurable period of time. If you don't use req.file("bar") right away in your code, Skipper will "hold on" to any files received on that parameter ("bar"). However it won't pump bytes from those files until you use req.file('bar') in your code. It does this without buffering to disk/memory by properly implementing Node's streams2 interface, which applies TCP backpressure and actually slows down, or even pauses, the upload. If you never use req.file('bar') in your code, any unused pending file streams received on "bar" will be discarded when the request ends.

My example controller code is below:

var Promise = require('bluebird');

module.exports = async function (req, res) {

var file1 = readUploadedFileASync(req, 'file1');
var file2 = readUploadedFileASync(req, 'file2');

  try {
    resolvedPromises = await Promise.all([file1, file2])
  }
  catch (err) {
    if (err.code === 'E_EXCEEDS_UPLOAD_LIMIT') {
      return res.badRequest('File size limit exceeded (' + sails.config.constants.MAX_UPLOAD_SIZE + ')');
    }
    else {
      console.log(err)
      throw err;
    }
  }
  //insert into DB
  var  newFilePair = await FilePair.create({
    path1: file1[0].fd,
    path2: file2[0].fd
  }).fetch();

  return res.ok(newFilePair );
}

//Promisify sails upload functionality that isnt bluebird promisify-able
var readUploadedFileASync = function (req, name) {
  return new Promise(function(resolve, reject) {
    req.file(name).upload({
      maxBytes: sails.config.constants.MAX_UPLOAD_SIZE,
      dirname: path.resolve(sails.config.appPath, 'FILES/reads'),
    }, function (error, files) {
      return error ? reject(error) : resolve(files);
    });
  });
};

It is expected in my app that users will submit very large files. Consequently, my MAX_UPLOAD_SIZE constant is set to a very large (3 GB). However, if my user submits a multipart form with a file1, file2, and a file3 (I am implementing an API, so I can't necessarily control what gets submitted via views or other self-implemented GUI elements), file3 is unexpected input, and therefore the fact that I'm not setting up a reciever/upstream for it would imply that (per the documentation) once both the following have happened

  1. maxTimeToBuffer has been reached for file3

  2. either the req finishes or a significant (configurable) period of time has passed since a new file upload has shown up on "file1" and "file2",

any ongoing upload of file3 would be disregarded (per #1), and file1/file2 would be 'complete' (per the second clause of #2). This would further imply that my Promise.all() statement would theoretically resolve, the response would be sent, and the ongoing request thats still submitting file3 would be disregarded or terminated by the sails engine since I have sent the response.

However, the previous is NOT what happens. In my example, I submit a 1 KB file1, a 2 KB file2, and a 2.5 GB file3 (in that order in the multipart form).

Wireshark captures and debug statements in sails show that file3 is uploaded to completion (all 2.5 GB of it) at which point the Promise.all() statement seems to resolve, even though I'm not actually listening for file3, nor am I awaiting any Promise that is reliant on file3 in any way. Consequently, it appears to me that the client-submitted request MUST complete before the skipper file-upload functionality will let go (contrary to #2 above). This seems like a massive oversight as this limits the ability to do any sort of in-flight input validation beyond a generic request-size maximum, which is insufficient for my purposes as it is expected that large requests will come in. However, more importantly, the observed behavior disagrees with the documentation insofar as the documentation seems to indicate that any file parameter not explicitly listened for is disregarded (after a timeout), whereas my wireshark captures as well as observed behavior (the entire transaction takes 30 - 45 seconds over a 1 Gbps connection my NAS).

I think it would be nice to have better in-flight input validation for sufficiently large requests, and so I'm hoping the solution here isn't just a change to the documentation.

sailsbot commented 5 years ago

@thesaxonedone Thanks for posting! We'll take a look as soon as possible.

In the mean time, there are a few ways you can help speed things along:

Please remember: never post in a public forum if you believe you've found a genuine security vulnerability. Instead, disclose it responsibly.

For help with questions about Sails, click here.

rachaelshaw commented 5 years ago

@thesaxonedone thanks for the detailed info. You did post this in the right place: we recently disabled issues in a lot of Sails-related repos and transferred existing issues here (easier to keep track of when they're not all spread out), but we still need to update the contribution guide to clarify that new issues should go here.

I don't exactly have enough context to try and reproduce the behavior you're seeing, but it's definitely supposed to work as documented. Would you be able to reproduce this in a PR to the skipper repo with a failing test?

thesaxonedone commented 5 years ago

Regarding a test....I'm not sure? I don't have much experience contributing to public repo discussions and I don't know test engines that well outside of a rudimentary knowledge of mocha. That being said, I'll try my best to describe how you might re-create the problem in a more succinct fashion than my admittedly verbose initial post.

  1. In any controller, expect a multipart form with multiple "file" fields. Call them what you want (I used file1 and file2 in my previous example). Assume that these files might could be very large because the application demands it

  2. Use the documented req.file(EXPECTED_FORM_PARAMETER_NAME) functionality to setup an upstream/listener for file1 and file2. Note that you will have to do these asynchronously because if file1 is sufficiently large, the file2 listener will not be created in time before the maxTimeToBuffer timeout is hit. Additionally, you will have to configure skipper to have a maxTimeToWaitForFirstFile set to something large as well (I used the max value of a 32bit signed int) because the file2 listener will sit there getting no data while file1 uploads.... again if file1 is large enough.

  3. await a Promise.all(file1, file2) to ensure that the uploads complete before firing off your subsequent app logic to send the response. I included my DB interactions in my previous code, but a simple res.ok('test complete') should suffice after the Promise.all() statement.

  4. On the client (I used postman), submit a multipart form request to the route associated with the previously mentioned controller and submit file1, file2 and an unexpected file3 (in that order). Additionally make file3 some egregiously large file (I used a 3 GB file).

  5. Witness that sails/skipper awaits for the entirety of the request to complete (including the uploading of the unexpected 3rd file). This can be confirmed by seeing that Postman sits there waiting for a response and also by using Wireshark to see the network traffic. This happens despite the fact that I have no upstream/listener configured for the unexpected 3rd file, and despite the fact that my Promise.all() is only awaiting the completion of file1/file2.

This appears to violate the documented behavior of When the request ends, or a significant (configurable) period of time has passed since a new file upload has shown up on "foo", the upstream is considered to be complete. This should theoretically mean that even though the request is ongoing (i.e. data is being uploaded to file3 at this point), because no data has been seen on file1 and file2, these streams are considered complete. I would think that this would mean that my Promise.all() awaiting the completion of file1 and file2 would resolve with the correct values, which it eventually does, but ONLY AFTER file3 has been uploaded (and subsequently disregarded because I'm not actually looking for it) in its entirety.

Additionally, this appears to violate the documented behavior in this paragraph In general, when possible, you should put req.file() outside of asynchronous callbacks in your code. However this isn't always possible-- Skipper tolerates this case "holding on" to unrecognized file streams (e.g. "bar") for a configurable period of time. If you don't use req.file("bar") right away in your code, Skipper will "hold on" to any files received on that parameter ("bar"). However it won't pump bytes from those files until you use req.file('bar') in your code. It does this without buffering to disk/memory by properly implementing Node's streams2 interface, which applies TCP backpressure and actually slows down, or even pauses, the upload. If you never use req.file('bar') in your code, any unused pending file streams received on "bar" will be discarded when the request ends. Skipper appears to be "holding on" insofar as the entire network transaction does in fact take place (no cancellation, or TCP backpressure), so I can only assume that if Skipper is correct in saying its not writing to disk, then the file exists in memory at a minimum.

thesaxonedone commented 5 years ago

As a followup, I'm not entirely sure what the best solution is..... interrupting an in-flight request to send a 200 OK response sounds like its not necessarily the best thing

Perhaps the solution is to have skipper throw an error saying that an unexpected file/parameter was submitted (if a listener isnt set up for it) and let the developer handle it in code and they can choose to send a 200 OK at their own peril? As it stands a "Error: EMAXBUFFER: An upstream (file3) timed out before it was plugged into a receiver" gets generated, but doesn't actually throw an error

thesaxonedone commented 5 years ago

Ok, I've looked into this some more and tried to simplify my example even further. I apologize for possibly wasting your time with my previous possibly extraneous information. All my previous messages can be disregarded in favor of this one.

Ultimately my original complaint was based on the fact that despite documentation indicating that if a specific 'file' wasn't actually being listened for via req.file(FILENAME) the data would be disregarded, I could still see the data going over the network which implied to me that at a minimum skipper was handling it and allocating memory to it. In other words, if my controller is expecting a multipart form with files "file1" and "file2", but my client also sends a "file3" the entire transaction would still complete (entirety of request was completed) despite the fact that "file3" was unexpected. This appeared to me to be a problem, especially if file3 was several GB in size because it would imply that skipper is allocating resources to deal with it and allocating 3 GB of mem or disk in some fashion.

Upon some digging into the skipper code, it appears that it is in fact dealing with it, but basically writing each HTTP Chunk that comes off the wire for this unlistened for "file3", into a blackhole writestream where the data is just lost into the computational ether (I like that phrasing better than garbage-collected :) ).... provided my reading of that code is correct. Can you confirm this? If this is true, then this answers/addresses what I thought was a much bigger problem.

That all being said, this presents two less-problematic issues:

  1. The client perceived behavior is that their entire request completes and we have know way to inform them that their request contained an extraneous file that is irrelevant to the application. In other words, we have no way to return a res.badRequest() or interrupt the client request if an extraneous parameter/file is encountered, which might be useful functionality in a transaction that is expected to be large.

  2. Dovetailing with the above, the network bandwidth is still expended to upload this large file, which is ultimately garbage collected, so while the blackhole writestream takes care of the issue of unnecessary CPU/memory footprint on the server, the server is still having to recieve the traffic.

I feel that both of these problems could be solved by exposing functionality to validate the parameters being sent (i.e. what JOI does for JSON requests), or throw a 400 to the client if and when an extraneous file parameter is encountered. I'm not entirely sure how this would be implemented, but perhaps someone more well-versed in skipper than I could come up with something clever. At a minimum, I feel that the limit option should be implemented. As it stands, I can only impose limits via the maxBytes option which in turn only applies to a given and expected file. Imposing a global limit on the POST body doesnt seem to be possible for a multipart form

Steps to reproduce:

Put this code in your controller

var path = require('path');
var Promise = require('bluebird');

module.exports = async function (req, res) {

  var file1 = readUploadedFileASync(req, 'fastQr1');
  var file2 = readUploadedFileASync(req, 'fastQr2');

  var resolvedPromises;

  try {
    resolvedPromises = await Promise.all([file1, file2])
  }
  catch (err) {
      console.log(err)
      throw err;
  }

  //App logic once file1 and file2 upload complete goes here

  return res.ok('end test');
};

//Promisify sails upload functionality that isnt bluebird promisify-able

var readUploadedFileASync = function (req, name) {
  return new Promise(function(resolve, reject) {
    req.file(name).upload({
      maxBytes: 4000000000,
      dirname: path.resolve(sails.config.appPath, 'FILES/reads'),
    }, function (error, files) {
      return error ? reject(error) : resolve(files);
    });
  });
};

Using postman, submit a multipart form with a small "file1", a small "file2", and a 3 GB extraneous "file3". Note that the request takes some time due to the fact that it is transferring the 3 GB file, even though that file is extraneous. Additionally, if you want, run a wireshark trace and note that the transfer is indeed happening.

Repeat the postman request without file3, and note that things function as expected

raqem commented 5 years ago

Hi @thesaxonedone since you are trying to validate the files being sent maybe moving your uploads into an actions2 that uses sails-hook-uploads could work. There is an example of how to set up your actions2 file on Ration (an example app built by Mike for a tutorial)

thesaxonedone commented 5 years ago

Sorry for the slow response. I will take a look at this and let you knwo what I find

thesaxonedone commented 5 years ago

So your suggestion did not alleviate the issue. I have a relatively simple fix for it that warrants a PR to the skipper repository.

Is there someone over there with whom I should converse before opening a PR regarding this? I have a solution but someone steeped in skipper knowledge might have a more graceful solution or have some wisdom to share

raqem commented 5 years ago

Hi @thesaxonedone any help is always appreciated. If you would please go ahead and make a pull request one of the senior devs will review it.

thesaxonedone commented 5 years ago

Is there a process I need to go through to get the PR some attention? Or is it just a waiting game? (I'm very experienced with Javascript, but not as experienced with contributing to the open-source community)

raqem commented 5 years ago

@thesaxonedone just letting us know when it's ready for review on this issue thread will be most helpful. And then a senior dev will review it as soon as possible.

thesaxonedone commented 5 years ago

Here is my proposed PR. https://github.com/balderdashy/skipper/pull/191

whichking commented 5 years ago

Hi, @thesaxonedone—

Sorry for the long silence on this issue!

What you describe as data being lost to the computational æther (I, too, prefer this to the more prosaic "garbage-collection") is expected/intended behavior.

Your second point is absolutely correct: the network bandwidth would indeed be expended to upload some large hypothetical file. A valid feature request might involve aborting the TCP connection if the request exceeds a certain number of bytes. This may also be an existing Express feature that we’re not currently exposing.

Finally, we'd love to know more about your use case!

thesaxonedone commented 5 years ago

Madison, apologies for my slow response, I've been on vacation. I have a multipart form that my front-end will submit (and submit gracefully). However, this form endpoint will also be exposed to clients other than my front-end. Consequently, I have to do input validation. Outside of that, the use-case details I think are detailed above, but I have attempted to flesh them out here. Let me know if you have specific questions.

My use-case is multi-part form where a user is to submit 2 related file parameters under unique parameter names ("file1", "file2"). It is expected that these files will be quite large (4 GB or so).

If the user elects to submit a 3rd file (as say "file3") because they are malicious, and that file is 30 petabytes, I don't want to expend the bandwidth receiving the file and incur costs from AWS or whomever, but as it stands, I cannot enforce this. The file size limit applies only to parameter names I've specifically called out (i.e. "listeners" that I have specifically created for the file parameters that I expect), but because "file3" is not specifically listened for, I cannot interrupt the connection.

Similarly, if the user submits a 3rd file by error (without malicious intent), they should not have to wait for a 12 GB (assuming all 3 files are 4 GB) transaction to complete before they see an error. I should be able to interrupt the request as soon as I see the 3rd parameter. While you proposed an interruption of the underlying TCP connection, I would push for the ability to capture the "error" within sails and act accordingly i.e. by sending an HTTP 413 or 400 (or something more graceful than a TCP RST) as soon as I see that the transaction payload includes the erroneous 3rd parameter, rather than wait for the entire upload of that 3rd parameter to complete to be able to respond.

mikermcneil commented 5 years ago

@thesaxonedone taking another look now w/ the team, just found a solution that we're pretty sure will work for ya (madison is following up shortly w/ more)

whichking commented 5 years ago

Hi, @thesaxonedone!

We messed around in the config/http.js file of one of our working projects and were able to produce what I believe is your desired behavior. We ended up using—as you mentioned earlier—the limit property, which is inherited by Sails from Express.

In config/http.js, there's a bit near the bottom (commented out by default) where you can configure the body parser. If you uncomment that code and add limit: [desired upper limit of requests] as shown below, you should get a nice client error when an oversized request comes through.

Here's the relevant bit of our config/http.js file:

    /***************************************************************************
    *                                                                          *
    * The body parser that will handle incoming multipart HTTP requests.       *
    *                                                                          *
    * https://sailsjs.com/config/http#?customizing-the-body-parser             *
    *                                                                          *
    ***************************************************************************/

    bodyParser: (function _configureBodyParser(){
      var skipper = require('skipper');
      var middlewareFn = skipper({ strict: true, limit: 2 });
      return middlewareFn;
    })(),

And here's the 413 status code: Screen Shot 2019-09-30 at 3 49 09 PM

thesaxonedone commented 5 years ago

While functional, this solution is less than ideal. In my example, it is expected that the transactions will be large, and while capping them in order to prevent them from being "too large" is good, ultimately what I'm after is the ability to validate the incoming submitted file parameters and react accordingly. If my limit parameter is set to 20 GB, but I know right away that the user has a malformed request because the first parameter they submitted is not one that I am listening for (i.e. never gets hooked up to a skipper's so called write-stream listeners - in my previous example "file3"), I know immediately that the request is malformed and should be able to interrupt it and not incur the cost of moving 20 GB across the internet only for it to end up in a writestream blackhole.

On Mon, Sep 30, 2019 at 2:56 PM Madison Hicks notifications@github.com wrote:

Hi, @thesaxonedone https://github.com/thesaxonedone!

We messed around in the config/http.js file of one of our working projects and were able to produce what I believe is your desired behavior. We ended up using—as you mentioned earlier—the limit property https://devdocs.io/express/index#express.urlencoded, which is inherited by Sails from Express.

In config/http.js, there's a bit near the bottom (commented out by default) where you can configure the body parser https://sailsjs.com/documentation/reference/configuration/sails-config-http#?configuring-skipper. If you uncomment that code and add limit: [desired upper limit of requests] as shown below, you should get a nice client error when an oversized request comes through.

Here's the relevant bit of our config/http.js file:

/***************************************************************************

*                                                                          *

* The body parser that will handle incoming multipart HTTP requests.       *

*                                                                          *

* https://sailsjs.com/config/http#?customizing-the-body-parser             *

*                                                                          *

***************************************************************************/

bodyParser: (function _configureBodyParser(){

  var skipper = require('skipper');

  var middlewareFn = skipper({ strict: true, limit: 2 });

  return middlewareFn;

})(),

And here's the 413 status code: [image: Screen Shot 2019-09-30 at 3 49 09 PM] https://user-images.githubusercontent.com/16329366/65918714-edc61e00-e39f-11e9-9797-7aab17a12a6b.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/balderdashy/sails/issues/4748?email_source=notifications&email_token=AHRVCUVX2ECNSSQKAYJFT43QMJYZ7A5CNFSM4HISNOP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD77HGTA#issuecomment-536769356, or mute the thread https://github.com/notifications/unsubscribe-auth/AHRVCUTYDEP6Z4META3L2N3QMJYZ7ANCNFSM4HISNOPQ .

whichking commented 4 years ago

Hey, @thesaxonedone!

Sorry this has dragged out so long. Are you on Gitter, by any chance? We'd love to explore your use case and discover a solution together.