aws / aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js
https://aws.amazon.com/developer/language/javascript/
Apache License 2.0
7.59k stars 1.55k forks source link

browserify not working for node apps #696

Closed doapp-ryanp closed 8 years ago

doapp-ryanp commented 9 years ago

Trying to browserify my node code for use in lambda to help combat the cold start issues (which also raises the ceiling on the hard 50MB zipped limit).

Is browserfying for node not supported? Ultamately I'd like to just use the --node browserify option, but its not working.

Ex: (I wont be hard coding creds, this is just for ease of reproduce.)

var AWS = require('aws-sdk');
AWS.config.update({
  accessKeyId: 'mykey',
  secretAccessKey: 'mysecret'
});
var s3 = new AWS.S3();
s3.listBuckets(function(err, data) {
  console.log(err, data);
});

If I run browserify bundletests.js > bundled.js then run node bundle.js I get XMLHttpRequest is not defined which is expected since I'm not running in the browser.

So I run browserify bundletests.js --dg --no-builtins --no-commondir > bundled.js to not use the builtins and I get

module.js:340
    throw err;
          ^
Error: Cannot find module '_process'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at s (/Users/ryan/projects/JAWS/tests/bundle/testapp/myLambda/bundled.js:1:176)

Any ideas?

AdityaManohar commented 9 years ago

@doapp-ryanp I'm still a little unclear on how you intend to use your browserify-ed code inside of AWS Lambda. Lambda requires the function to export a top level CommonJS module.

That being said, browserify-ing the SDK for use in Node.js is currently not supported, because we rely on the window object to hang the AWS namespace on. There might be other potential issues like the one you are running into.

Would you be able to explain a use case a little more? I'm curious about why uploading an archive of Node.js modules is not working well for you.

doapp-ryanp commented 9 years ago

@AdityaManohar I have broswerfied code that is working just fine in lambda. You can export top level CommonJS method. The bummer is the only node module I have not gotten to work so far is aws-sdk.

If your curious on how I'm doing this, take a look at the JAWS code I wrote.

I wan't to be able to bundle up aws-sdk so I'm not restricted to the version that is included in lambda (2.1.35 is already backlevel). Because its not browserify compatible, I have to .exclude('aws-sdk') and use the external 2.1.35 version thats included in lambda. If its not obvious why this is problematic let me know and I will elaborate.

As for the use case for bundling - first some background:

I'd like to start out by saying I love lambda+api gateway. It is a game changer. With that said the cold start lag times make it un-useable as a production webservice/api layer. I define cold start as: 1st call after initial creation (brand new lambda, or contaner that has been provisioned due to load) OR 1st call after being frozen from inactivity (currently ~5mins).

I know this is a extremely difficult problem to solve at scale (and the problem is worse in Java lambdas currently). I (and co-workers/friends) have had discussions with lambda and api gateway engineers and I know they are working on improving the cold start issues. I'm aware this issue is not the forum to address the problem - but you must understand the background in order to understand my use case.

With this in mind, the nature of the product (containers launching + grabbing code, containers being unfrozen etc) it is of paramount importance that the following conditions are optimized as much as possible to cut down on cold start times: 1) code size. Smaller size, smaller compressed file, faster code gets from storage into the container. Less code faster the VM starts. 2) Removal of all code that is note directly related to processing the event. [Best practices] from AWS docs (http://docs.aws.amazon.com/lambda/latest/dg/best-practices.html) concurs with this.

So my use case is to leverage build/bundle tooling like browserify/systemjs + uglifyier/minifier.

I want lambda+api gateway to be a successful production grade product. With AWS's track record I think it will become just this. Cutting every ms of lag time off that can be done with a reasonable amount of work, will go a long way in making it successful. IMO most lambda code that uses the aws-sdk will only be using a fraction of its codbase - supporting a popular tool like browserify to remove the un-used code is a reasonable request.

I know its more work, but with AWS's investment in lambda, maybe now is the time to invest more resources in the aws-sdk to separate it into sdk for browser and sdk for node. My hunch is this will not only make it easier for tools like browserify/systemjs to work, but it will also allow for some optimizations that will improve its efficiency in lambda.

Sorry this was so long winded, hard to convey this problem and use case concisely

AdityaManohar commented 9 years ago

@doapp-ryanp Thanks for the explanation! This definitely helps understand your use case better.

I did some investigation and I was able to reproduce the problem with a rather trivial use case:

// log.js

var log = function() {
  console.log(process.memoryUsage());
};

module.exports = log;
// source.js
var log = require('./log');
log();

Browserify-ing the source file still throws the same error

$> browserify source.js --dg --no-builtins --no-commondir > bundle.js
module.js:338
    throw err;
          ^
Error: Cannot find module '_process'
    at Function.Module._resolveFilename (module.js:336:15)
    at Function.Module._load (module.js:278:25)
    at Module.require (module.js:365:17)
    at require (module.js:384:17)
    at s (/Users/aditm/dev/code/bundle.js:1:176)
    at /Users/aditm/dev/code/bundle.js:1:367
    at Object._process (/Users/aditm/dev/code/bundle.js:16:14)
    at s (/Users/aditm/dev/code/bundle.js:1:316)
    at /Users/aditm/dev/code/bundle.js:1:367
    at Object../log (/Users/aditm/dev/code/bundle.js:2:11)

It also looks like we only depend on the window object when it is available. This leads me to believe that there shouldn't be a problem when you try to use the browserify-ed SDK in Node. Presumably there is somethings else going on with the way browserify handles globals.

Hope this helps!

doapp-ryanp commented 9 years ago

Nope it does not work. In my continued digging, turns out there is a way to insert global vars, which gets around the cant find _process module error.

However this exposes another problem. api_loader.js does require()s using __dirname.

Not quite sure why __dirname is needed here (vs using rel paths) but ignoring that., according to https://github.com/aws/aws-sdk-js/issues/383 it looks like you have some special code that runs for your hosted aws sdk for browser. But later in that issue @lsegal says that as of v2.0.20 you no longer need to do this hack.

I asked in that issue a week or so ago for some clarification. What am I missing here?

Here is my browserify code:

browserify({
        basedir: baseDir,
        entries: entries,
        standalone: 'lambda',

        //setup for node app (copy logic of --node in bin/args.js)
        browserField: false,
        builtins: false,
        commondir: false,
        detectGlobals: true,  //default for bare in cli is true, but we dont care if its slower

        //handle process https://github.com/substack/node-browserify/issues/1277
        insertGlobalVars: {
          //__filename: insertGlobals.vars.__filename,
          //__dirname: insertGlobals.vars.__dirname,
          //process: insertGlobals.vars.process,
          process: function() {
            return;
          },
        },
      })

Here is output from lambda (as expected __dirname is replaced during browserify):

{
  "errorMessage": "Cannot find module '/Users/ryan/jawstests/my-project/back/node_modules/aws-sdk/apis/metadata.json'",
  "errorType": "Error",
  "stackTrace": [
    "Function.Module._resolveFilename (module.js:338:15)",
    "Function.Module._load (module.js:280:25)",
    "Module.require (module.js:364:17)",
    "require (module.js:380:17)",
    "i (/var/task/main.js:1:481)",
    "/var/task/main.js:1:672",
    "a (/var/task/main.js:1:14017)",
    "Object.i [as services] (/var/task/main.js:1:14473)",
    "Object.<anonymous> (/var/task/main.js:3:18405)",
    "Object../api_loader (/var/task/main.js:3:18590)"
  ]
}
AdityaManohar commented 9 years ago

I asked in that issue a week or so ago for some clarification. What am I missing here?

The AWS_SERVICES environment variable is used to select the services that you want to build in the browserify-ed distributable of the SDK. You need to set it only when you want to cherry-pick the services you are including support for. If you don't set this environment variable then it will include support for all the services that support CORS.

I hope this helps clarify things!

doapp-ryanp commented 9 years ago

thanks that helps clarify how AWS_SERVICES is used. I want them all and I want to let browserify remove code that is not being used in my code path.

What needs to be done for browserify to correctly include metadata.json (and other json) files in apis? The docs for building via browserify do not work for the AWS SDK for node.

lsegal commented 9 years ago

@doapp-ryanp

I want them all and I want to let browserify remove code that is not being used in my code path.

The problem here is that, due to the way the SDK is loaded (you just require('aws-sdk')), that's not possible. If browserify were to inspect your codepath based on that require call, it would be pulling in the entire SDK, regardless of which service classes you were using. That's why browserifying for Node.js does not really provide any advantage without AWS_SERVICES. Similarly, if we were to correctly parse the metadata file the same way we do in browser environments, we would be pulling in all services.

Is size on disk correlated to cold start time? It's worth noting that although there's a bit of data on disk, that data is not actually loaded until a service is constructed, so it does not impact require() time.

lsegal commented 9 years ago

@doapp-ryanp also, if this really is an issue going forward, you might want to consider relying on the system-wide install of aws-sdk in Lambda, since that's always present and never requires an install-- and is kept fairly up-to-date. It might be worth elaborating why this is a blocker for you; the Lambda team would certainly be interested in knowing.

doapp-ryanp commented 9 years ago

@lsegal thanks for the clear explanation.

Is size on disk correlated to cold start time

This does indeed seem to be the case. Ignoring runtime all together, larger the file the longer it takes to get into the container that Lambda creates. Leads to slower initial provision and provisioning of said container when scaling out. I know we are only talking about 500k here zipped for the aws-sdk, but since most npm mods work with browserify we figured it was a reasonable optimization step to run to shave every ms off of lambda start times that we could.

It might be worth elaborating why this is a blocker for you

Relying on the system-wide install of the aws-sdk, and the single nodejs version currently supported in lambda is a blocker because:

I created this forum post back in May on this subject. Have not been given a clear direction on how Lambda team is going to handle these issues.

lsegal commented 9 years ago

I can't respond to any of these with authority (you might want to ping your forum thread and get an official response from the current SDK maintainers), but from experience I can say that:

Security: if there was some kind of security patch going out with the SDK it's extremely likely that this release would be synchronized with Lambda's update of their version. AWS takes security patches very seriously, you're unlikely to see a significant delay when it comes to this stuff.

Backwards compatibility: The SDK follows Semantic Versioning. Going from 2.1 to 2.2 is not an incompatible update; only 2.x to 3.x. Lambda would NOT upgrade their SDKs from 2.x to 3.x if it involved breaking changes; that would break all of their existing users. I know from experience that AWS takes backwards compatibility very seriously, and so does this SDK.

For the rest you might want to get an official answer. Hope that helps!

doapp-ryanp commented 9 years ago

Thanks for the prompt response.

Re: backwards compat: As it stands today, lambda only supports ONE version of aws-sdk and one version of Node.js. My point was more that this architecture decision in a node runtime is significantly flawed. I will indeed continue to try to take it up with them, however it is also in your best interest for them to support multiple because as it stands now you will never be able to release a v3.

This same logic applies for node.js runtime. It will cripple the ability for the product to move forward at the speed the node ecosystem requires.

Just for completeness sake, I am aware I can bundle my own (non browserified) version of aws-sdk and I can even bundle my own version of node/io.js. Both will increase the bloat (discussed before) and bundling the runtime as it stands today is a hack and has a CGI style interface (wont go into that discussion here).

LiuJoyceC commented 8 years ago

Node.js 4.3 is now available in Lambda so you can leverage ES6 features in your Lambda functions, and Lambda does update its default version of the aws-sdk on a regular basis. While this does not resolve SDK's incompatibility with browserify --node, this should at least ameliorate the issue of "inability to leverage the fast moving ecosystem". Hope this helps!

LiuJoyceC commented 8 years ago

If you still would like to bundle the latest version of the SDK to use in Lambda and not use the default version of the SDK pre-installed in Lambda, you can zip your directory that contains both the code for your Lambda function and a node_modules folder that contains your desired version of the SDK. Then your require('aws-sdk') in your code should reference the aws-sdk module in your zip rather than the the pre-installed version in Lambda. For more info: http://docs.aws.amazon.com/lambda/latest/dg/nodejs-create-deployment-pkg.html

If file size is a concern, there are folders and files you can delete from aws-sdk before you zip it. In the root aws-sdk folder, you can safely delete the dist, dist-tools, and scripts folders without affecting your code. If you are only using a few AWS services in your code, you can also delete most of the files in apis as well. However, if you delete the api files for a service (or a version of a service) you don't use, be sure to remove the service or version from apis/metadata.json, or an error will be thrown when you run your code. If you are not making use of waiters or paginators in your code, you can also delete those files in apis. Deleting these folders and files will cut down the SDK's size by over 4MB (the entire SDK is over 7MB).

chrisradek commented 7 years ago

@doapp-ryanp PR #1123 adds support for using browserify or webpack with node projects. It also lets you require individual services in your code.

Here's an example of a quick and dirty module:

// handler.js
var S3 = require('aws-sdk/clients/s3');

var handler = function() {
    var s3 = new S3();
    s3.headObject({
        Bucket: 'BUCKET',
        Key: 'KEY'
    }, function(err, data) {
        console.log(err, data);
    });
};

module.exports = handler;

Running browserify --node --standalone='lambda' handler.js -o bundle.js created a bundle that is significantly less than the size of the full SDK (in this example, less than 700 KB). That's without minifying the code as well. The same can also be done using webpack.

Let me know if this helps with your development! I'll update again once this feature is in master.

doapp-ryanp commented 7 years ago

@chrisradek this is awesome!! This is a big step forward for NodeJS based Lambda's. Thank you for not forgetting about this issue.

What version release of the SDK do you anticipate this making it into?

@ac360 @flomotlik @eahefnawy heads up ^

chrisradek commented 7 years ago

The PR was just merged and released as part of 2.6.0. We'll work on updating docs next, but you should be able to start taking advantage of this functionality now.

doapp-ryanp commented 7 years ago

sweet. appreciate you getting it in a release so quick. I'll roll this into one of my projects here soon (week or 2) and report back.

doapp-ryanp commented 7 years ago

@chrisradek sorry it took so long for me to get to this, but I have just tested the aws-sdk for JS against browserify using this new componentized functionality and it works AWESOME.

It is producing super small code sizes. This functionality will now allow me to stay up to date with the most recent aws-sdk-js without paying a performance penalty.

Great work!!

ps - this issue can be closed IMO

chrisradek commented 7 years ago

@doapp-ryanp Glad to hear this change is working out for you!

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.