JCMais / node-libcurl

libcurl bindings for Node.js
https://npmjs.org/package/node-libcurl
MIT License
664 stars 118 forks source link

Random core dumps on AWS Lambda 10.x #176

Closed tnolet closed 5 years ago

tnolet commented 5 years ago

Hi,

I'm running node-libcurl on AWS Lambda Node.js 10.x and I'm getting random core dumps. So, one request completes completely normal once and then the next time it fails with a dump.

There errors come in two parts:

  1. Some illegal pointer exception.
llpropertylink-mainview\":free(): invalid pointer

or

free(): invalid pointer
  1. The default core dump message
Runtime exited with error: signal: aborted (core dumped)
Runtime.ExitError

I compile node_libcurl.node using a Docker container called lambci/lambda:build-nodejs10.x and just copy the resulting binary to the zip used to deploy to Lambda.

Using some debug statements, I can see that the end or error event are not emitted when there is a dump. The last thing in the logs is the curl.perform() method being called.

The weird thing is, outside of these core dumps, it works perfectly for 90%-95% of all requests.

JCMais commented 5 years ago

can you replicate the same behavior using just the docker image locally?

Could you share the Dockerfile + .js files that reproduce that?

tnolet commented 5 years ago

@JCMais I will whip up an example test case if I can reproduce it consistently.

tnolet commented 5 years ago

@JCMais I think I can actually reproduce it using this package https://github.com/lambci/docker-lambda. Will work on a reproduceable test case. But it seems the .perform() method randomly crashes the full container.

tnolet commented 5 years ago

@JCMais

sorry, not a complete & easy to run reproducable test case yet, but a breakthrough nonetheless. It's the SSL. Doing straight GET request on http://www.google.com never fails. Changing to https:// fails in 50% or so of the cases.

Here's a slightly modified version of one of the examples. I import the CA file from a simple helper object. It's a copy & paste from https://curl.haxx.se/docs/caextract.html. I set it before running the request. Using writeSync of promises does not really make a difference.

const CAcert = require('./src/cacert')
const path = require('path')
const fs = require('fs')

const { Curl } = require('node-libcurl')

function handler (message, context, cb) {
  console.log('*** START HANDLER ***')

  const curl = new Curl()

  curl.setOpt('URL', 'https://google.com')
  curl.setOpt('FOLLOWLOCATION', true)

  // write the CA file to disk
  if (process.platform !== 'darwin') {
    const fullCAPath = path.join('/tmp/', CAcert.filename)

    try {
      fs.statSync(fullCAPath)
    } catch (err) {
      fs.writeFileSync(fullCAPath, CAcert.blob)
    }
    // set CAINFO option
    curl.setOpt(Curl.option.CAINFO, fullCAPath)
  }

  // Do a basic request on HTTPS
  curl.on('end', function (statusCode, data, headers) {
    console.info(statusCode)
    console.info('---')
    console.info(data.length)
    console.info('---')
    console.info(this.getInfo('TOTAL_TIME'))

    console.log('*** STOP HANDLER: SUCCESS')

    this.close()
    cb()
  })

  curl.on('error', function (err) {
    console.log('*** STOP HANDLER: ERROR')
    this.close()
    cb(err)
  })
  curl.perform()
}

module.exports = {
  handler
}

Here's a dump of the console output. You can see four requests being triggered, two fail, two pass.

Tims-MBP:test tim$ ./run_docker_lambda.sh 
START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
2019-06-14T14:24:28.810Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** START HANDLER ***
2019-06-14T14:24:32.703Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    200
2019-06-14T14:24:32.703Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    ---
2019-06-14T14:24:32.703Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    12153
2019-06-14T14:24:32.704Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    ---
2019-06-14T14:24:32.705Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    3.884253
2019-06-14T14:24:32.706Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** STOP HANDLER: SUCCESS
END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72  Duration: 3911.53 ms    Billed Duration: 4000 ms    Memory Size: 1536 MB    Max Memory Used: 58 MB  
null
Tims-MBP:test tim$ ./run_docker_lambda.sh 
START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
2019-06-14T14:24:37.835Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** START HANDLER ***
END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72  Duration: 1398.40 ms    Billed Duration: 1400 ms    Memory Size: 1536 MB    Max Memory Used: 7 MB   
{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Error: Runtime exited without providing a reason"
}
Tims-MBP:test tim$ ./run_docker_lambda.sh 
START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
2019-06-14T14:25:48.661Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** START HANDLER ***
END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72  Duration: 1635.55 ms    Billed Duration: 1700 ms    Memory Size: 1536 MB    Max Memory Used: 7 MB   
{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Error: Runtime exited without providing a reason"
}
Tims-MBP:test tim$ ./run_docker_lambda.sh 
START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
2019-06-14T14:25:53.878Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** START HANDLER ***
2019-06-14T14:25:58.025Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    200
2019-06-14T14:25:58.025Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    ---
2019-06-14T14:25:58.025Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    12153
2019-06-14T14:25:58.025Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    ---
2019-06-14T14:25:58.025Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    4.137917
2019-06-14T14:25:58.025Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** STOP HANDLER: SUCCESS
END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72  Duration: 4161.81 ms    Billed Duration: 4200 ms    Memory Size: 1536 MB    Max Memory Used: 59 MB  
null
JCMais commented 5 years ago

@tnolet I have no experience with lambda, could you provide exact steps to reproduce this? Like the Dockerfile used and the script you are using to run the container? Had no luck trying to make it run.

Thanks for taking the time to look into that!

tnolet commented 5 years ago

@JCMais Here is a repo that should demonstrate the error reliably

https://github.com/tnolet/node-curl-lambda-bug

Just clone it and follow the instructions. As far as I can see, something in the ALPN/SSL/TLS setup has race condition somewhere that sometimes triggers, and sometimes doesn't...

P.S. I'm looking to start using node-libcurl with https://checklyhq.com so I'm pretty invested in solving this bug. Please don't hesitate to ask for more input.

JCMais commented 5 years ago

@tnolet I will be able to take a look into that on Thursday.

The first thing I would do is to make sure the OpenSSL version the addon was built with is the same than the one being used by Node.js. For more details on why this is important check here: https://github.com/JCMais/node-libcurl#important-notes-on-prebuilt-binaries--direct-installation

You can get the version of the libraries Node.js uses by running node -e "console.log(process.versions)".

Another option is to build libcurl with a SSL/TLS backend other than OpenSSL, like GnuTLS.

Since you are copying the .node file, you probably want to build the addon statically, there are more info about that on the same link above.

Let me know if you have any questions

sznowicki commented 5 years ago

@tnolet I suspect this issue might be related to #177

But for now, since you want to use it in a similar manner as I do (agentslug.com) I suggest you to go with last 1.x version which works for me in lambda environment for more than a year with no error at all.

tnolet commented 5 years ago

@sznowicki Great tip! Coincidentally, I had also had issues running Jest tests!

ghost commented 5 years ago

I’m also having issues running this on EC2 with the amazon Linux image. Getting segfaults. Don’t have an easy way to reproduce unfortunately, maybe it’s just time to switch to Ubuntu

ghost commented 5 years ago

So, I just spent the last few hours making a new EC2 image. I switched to Ubuntu 18 and everything is working so much better. The static binary bundled with node-libcurl installed flawlessly on Node v10.16.

I wasted so much time compiling stuff and banging my head against the wall with Amazon Linux. Switch if you can.

tnolet commented 5 years ago

@curiouslybrown that's great news! Regretfully, I'm on AWS Lambda so I can't pick the linux distro.

JCMais commented 5 years ago

@tnolet I have taken a closer look into this issue today, and the main problem is that the prebuilt addon is not compatible with Amazon Linux. I had no time to investigate further why that happens. The solution is to rebuild the addon directly.

The good news is that I've made some adjustments to the build script that is used to create the prebuilt binaries so you can use it more easily. Here are the steps you need to take to build it yourself (those steps are based on the repository you posted above, https://github.com/JCMais/node-libcurl/issues/176#issuecomment-502446705):

  1. Remove the Dockerfile since we are not going to need a separated docker image
  2. Run:
    docker run --rm -it -v "$PWD":/var/task --entrypoint /bin/bash lambci/lambda:build-nodejs10.x
  3. Inside the container, run the following commands sequentially:
    # needed to retrieve dependencies
    yum install wget
    # yarn is used on the build script instead of npm
    #  it will give an error, but just need to export the PATH with the yarn dirs added
    curl -o- -L https://yarnpkg.com/install.sh | bash
    export PATH="$HOME/.yarn/bin:$HOME/.config/yarn/global/node_modules/.bin:$PATH"
    # retrieve this repository, we need to checkout develop because the adjustments are only there for now
    #  there are no changes currently besides that one, so it's safe to use (you probably want to checkout some specific commit tho).
    git clone https://github.com/JCMais/node-libcurl.git && cd node-libcurl && git checkout develop && cd ..
    # built the addon statically with all their dependencies (this can take a few minutes)
    PUBLISH_BINARY=false GIT_TAG="" GIT_COMMIT="" STOP_ON_INSTALL=true PREFIX_DIR="/var/task" ./node-libcurl/scripts/ci/build.sh
    # we are done
    exit
  4. Now you can just run the script you created: ./run.sh https

It worked here, I've run it 100 times and it succeeds on all of them.

tnolet commented 5 years ago

@JCMais that's awesome. Will test & deploy this ASAP. I can run it in "canary mode" and track it with live traffic while measuring any errors.

tnolet commented 5 years ago

I think something is not completely right with your script. I tweaked it a little (the curl command and getting yarn to install) but the building of the addon finishes straightaway, as if it is doing nothing.

I think I'm missing something here...

# wget
yum install wget -y
#yarn
curl -OL https://yarnpkg.com/install.sh
chmod +x install.sh
./install.sh
export PATH="$HOME/.yarn/bin:$HOME/.config/yarn/global/node_modules/.bin:$PATH"
# repo
git clone https://github.com/JCMais/node-libcurl.git && cd node-libcurl && git checkout develop && cd ..
# build. This finishes directly.
PUBLISH_BINARY=false GIT_TAG="" GIT_COMMIT="" STOP_ON_INSTALL=true PREFIX_DIR="/var/task" ./node-libcurl/scripts/ci/build.sh
# we are done
exit
JCMais commented 5 years ago

Weird, it finishes with any error or there is no output?

I probably missed some dependency while writing the commands down. Edit the script and remove the output redirection for when building some of the dependencies, to be more exact, those are the lines you want to look:

https://github.com/JCMais/node-libcurl/blob/8eb18a8a1029a46f281801fb9b88d2bb751264de/scripts/ci/build.sh#L59

https://github.com/JCMais/node-libcurl/blob/8eb18a8a1029a46f281801fb9b88d2bb751264de/scripts/ci/build.sh#L68

https://github.com/JCMais/node-libcurl/blob/8eb18a8a1029a46f281801fb9b88d2bb751264de/scripts/ci/build.sh#L92

tnolet commented 5 years ago

Aha, found it. I was actually not executing it from the node-libcurl director due to the last && cd .. I thought this was on purpose, but the build.sh had some references to ./scripts/ci/<file> which where not found. Errors where swallowed by the /dev/null

JCMais commented 5 years ago

Oh, sorry, I'm horrible at copy/paste 😄

let me know if it works

tnolet commented 5 years ago

@JCMais

got the build scenario working. Just running it now gives

START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
2019-07-02T20:23:45.900Z    52fdfc07-2182-154f-163f-5f0f9a621d72    INFO    *** START HANDLER ***
END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72  Duration: 24.00 ms  Billed Duration: 100 ms Memory Size: 1536 MB    Max Memory Used: 53 MB  
{
  "errorType": "TypeError",
  "errorMessage": "Cannot read property 'CURLE_OK' of undefined"
}

am I correct that the built artefact is in node-libcurl/lib/binding/node_libcurl.node and that that is the one I should be using inside the Lambda contianer?

JCMais commented 5 years ago

@tnolet you are probably hitting one of the breaking changes in v2, Curl.code constants were moved to their own exports, CurlCode, CurlMultiCode and CurlShareCode.

In fact, you made me see that the Changelog was incorrect, I just fixed it: https://github.com/JCMais/node-libcurl/commit/4a1bde612318d3374c19c56883965fe8d43e5e77

Thanks 😄

tnolet commented 5 years ago

Ok, got it now. For some reason I'm only able to build a correct binary using the develop branch. If I check out the tagged v2.0.1 branch and follow the exact same procedure, I get an error on not finding HMAC_CTX_init during compilation.

I will try to run a shadow version now of the binary I built from the develop branch on our production environment.

JCMais commented 5 years ago

@tnolet you can use the commit 4a1bde612318d3374c19c56883965fe8d43e5e77, which is the last one that includes the fixes to the build script and does not have any extra breaking changes that can appear on the develop branch.

I recommend you to do that until I create a new release with those changes.

tnolet commented 5 years ago

@JCMais awesome, thanks for the great help sofar

tnolet commented 5 years ago

This seems to be fixed when building a dedicated binary using the dev branch or commit 4a1bde612318d3374c19c56883965fe8d43e5e77