adobe / helix-deploy

A multi-cloud deployment tool for serverless functions running on AWS Lambda, Adobe I/O Runtime, Azure Functions, and Google Cloud Functions. Write once, run everywhere.
Apache License 2.0
13 stars 16 forks source link

AWS: investigate 2 seconds delay before response is returned #150

Open tripodsan opened 3 years ago

tripodsan commented 3 years ago

image

could be:

tripodsan commented 3 years ago

it's probably a http request that is not traced. non of the functions in AWS show http traces, when helix-fetch is the client.

/cc @ranrib

ranrib commented 3 years ago

Hi @tripodsan - is this using http2 or 1? Are you using webpack or any other method to bundle the functions?

tripodsan commented 3 years ago

Hi @tripodsan - is this using http2 or 1?

I think in this case we try to use http/2. but if the server supports it, i don't know

Are you using webpack or any other method to bundle the functions?

yes. we are using webpack. but no obfuscation / minification.

stefan-guggisberg commented 3 years ago

Hi @tripodsan - is this using http2 or 1?

I think in this case we try to use http/2. but if the server supports it, i don't know

The protocol is negotiated via ALPN, i.e. we don't 'try' http/2. The negotiated protocol is most certainly http/2.

ranrib commented 3 years ago

I've tried to re-create the following scenario:

const epsagon = require('epsagon');
const { fetch } = require('@adobe/helix-fetch');
epsagon.init({
  token: 'test',
  appName: 'Helix Test',
  metadataOnly: false,
});
async function test() {
    const resp = await fetch('https://httpbin.org/get');
    console.log(resp.ok);
}
const testFunction = epsagon.nodeWrapper(test);
testFunction()

And it works for me (I've verified that it's indeed HTTP/2). Do you have any idea or insight what is this different than your code?

stefan-guggisberg commented 3 years ago

Hi @ranrib, I tried your example but I am struggling to find the generated trace in the Epsagon UI. I used helix-fetch Test for appName but don't see that app on Epsagon. Pointers welcome.

tripodsan commented 3 years ago

@stefan-guggisberg when you set process.env.EPSAGON_DEBUG='true'; so you see the trace sent to the epsagon server?

stefan-guggisberg commented 3 years ago

@tripodsan yes

here's the full output (I also enabled helix-fetch debug output):

[EPSAGON] patching module: aws-sdk
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: aws-sdk
[EPSAGON] patching module: aws-sdk
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: aws-sdk
[EPSAGON] patching module: aws-sdk
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: aws-sdk
[EPSAGON] patching module: fetch-h2/dist/lib/context-http1
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: fetch-h2/dist/lib/context-http1
[EPSAGON] patching module: simple-oauth2/lib/client.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: simple-oauth2/lib/client.js
[EPSAGON] patching module: simple-oauth2/lib/client/client.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: simple-oauth2/lib/client/client.js
[EPSAGON] Patching http2 module
[EPSAGON] patching module: pg
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: pg
[EPSAGON] patching module: pg-pool
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: pg-pool
[EPSAGON] patching module: mysql2
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mysql2
[EPSAGON] patching module: mysql2
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mysql2
[EPSAGON] patching module: mysql/lib/Connection.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mysql/lib/Connection.js
[EPSAGON] patching module: redis
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: redis
[EPSAGON] patching module: ioredis
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: ioredis
[EPSAGON] patching module: dy-ioredis/lib/redis.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: dy-ioredis/lib/redis.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: amazon-dax-client
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amazon-dax-client
[EPSAGON] patching module: amazon-dax-client
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amazon-dax-client
[EPSAGON] patching module: openwhisk/lib/actions.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: openwhisk/lib/actions.js
[EPSAGON] patching module: @google-cloud/common/
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @google-cloud/common/
[EPSAGON] patching module: @google-cloud/pubsub/
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @google-cloud/pubsub/
[EPSAGON] patching module: @google-cloud/pubsub/
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @google-cloud/pubsub/
[EPSAGON] patching module: nats
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: nats
[EPSAGON] patching module: mqtt
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mqtt
[EPSAGON] patching module: kafkajs
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: kafkajs
[EPSAGON] patching module: kafkajs/src/cluster/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: kafkajs/src/cluster/index.js
[EPSAGON] patching module: kafka-node
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: kafka-node
[EPSAGON] patching module: bunyan
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: bunyan
[EPSAGON] patching module: pino/lib/tools
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: pino/lib/tools
[EPSAGON] patching module: @azure/storage-blob
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @azure/storage-blob
[EPSAGON] patching module: @azure/storage-blob
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @azure/storage-blob
[EPSAGON] patching module: @azure/cosmos
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @azure/cosmos
[EPSAGON] hooking winston-cloudwatch
[EPSAGON] hooked winston-cloudwatch
[EPSAGON] patching module: winston/lib/winston/logger
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: winston/lib/winston/logger
[EPSAGON] patching module: amqplib/lib/channel.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amqplib/lib/channel.js
[EPSAGON] patching module: amqp/lib/exchange.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amqp/lib/exchange.js
[EPSAGON] patching module: ldapjs
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: ldapjs
[EPSAGON] patching module: cassandra-driver/lib/client
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: cassandra-driver/lib/client
[EPSAGON] loading azure metadata, url: (http://169.254.169.254/metadata/instance?api-version=2019-06-01)
[EPSAGON] Loading EC2 metadata
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
  helix-fetch:core established TLS connection: #1 (httpbin.org) +0ms
  helix-fetch:core httpbin.org -> h2 +1ms
  helix-fetch:h2 reusing socket #1 (httpbin.org) +0ms
  helix-fetch:h2 GET httpbin.org/get +2ms
  helix-fetch:h2 session https://httpbin.org established +2ms
  helix-fetch:h2 caching session https://httpbin.org +0ms
  helix-fetch:h2 session https://httpbin.org remoteSettings: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65536,"maxFrameSize":16777215,"maxConcurrentStreams":128,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false} +1ms
  helix-fetch:h2 session https://httpbin.org localSettings: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65535,"maxFrameSize":16384,"maxConcurrentStreams":4294967295,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false} +107ms
true
[EPSAGON] Sending trace async...
  helix-fetch:core resetting context +0ms
  helix-fetch:h2 resetContext: destroying session (socket #1, httpbin.org) +10ms
[EPSAGON] [QUEUE] Trace size 1606 Bytes pushed to queue
[EPSAGON] [QUEUE] Queue size: 1 traces, total size of 1606 Bytes
  helix-fetch:h2 session https://httpbin.org closed +4ms
  helix-fetch:h2 discarding cached session https://httpbin.org +0ms
[EPSAGON] Could not load azure metadata
[EPSAGON] Could not load EC2 metadata for instance-id
[EPSAGON] Could not load EC2 metadata for instance-type
[EPSAGON] Could not load EC2 metadata for local-ipv4
[EPSAGON] Could not load EC2 metadata for public-hostname
[EPSAGON] Could not load EC2 metadata for public-ipv4
stefan-guggisberg commented 3 years ago

The above output was generated by running this code with DEBUG=helix-fetch*:

require('dotenv').config();

const epsagon = require('epsagon');
const { fetch, reset } = require('@adobe/helix-fetch');

const { EPSAGON_TOKEN } = process.env;

epsagon.init({
  token: EPSAGON_TOKEN,
  appName: 'helix-fetch Test',
  metadataOnly: false,
});
async function test() {
    const resp = await fetch('https://httpbin.org/get');
    console.log(resp.ok);
}
const testFunction = epsagon.nodeWrapper(test);

testFunction()
  .catch(console.error)
  .finally(reset);
stefan-guggisberg commented 3 years ago

Hi @ranrib, care to comment?

ranrib commented 3 years ago

Hi @stefan-guggisberg ! Sorry for the late reply. Please add as an init parameter the following: sendBatch: false

You should be able to see the trace afterwards

stefan-guggisberg commented 3 years ago

Thanks @ranrib, that worked. I can see the traces now in epsagon. It would be nice if the http protocol version would be visible in the trace.

Now that we've confirmed that traces of http/2 requests generally work we still have to figure out why not a single http/2 request (there are tons!) is captured in our production env (aws lambda)...

tripodsan commented 3 years ago

sendBatch: false

should update our epsagon instrumentation to always disable batches?

ranrib commented 3 years ago

@tripodsan yes - since you're running on ephemeral envs, batching is less relevant. so please set it for all.

tripodsan commented 3 years ago

@tripodsan yes - since you're running on ephemeral envs, batching is less relevant. so please set it for all.

I just check and we already set it.

tripodsan commented 3 years ago

@ranrib I did some more testing with a simpler function: https://github.com/adobe/helix-version-picker

I did the same as yesterday, i.e. deployed the function manually, but this time I didn't see the request. https://app.epsagon.com/spans/0eb9412a-5550-e4e5-4be1-c045f5e5dbc6?tab=graph&start_time=1618850102

then I deliberately disabled HTTP2, redeployed and then I see the http request to github in the trace: https://app.epsagon.com/spans/0eb94143-c87d-8db6-fe52-bfecd3be9a20?tab=graph&start_time=1618850268

I also created a completely new function, but still no request: https://app.epsagon.com/spans/0eb94167-a131-d5db-23b9-65730b74bda7?tab=graph&start_time=1618850502

btw, here's how to invoke the function

 curl -sD - "https://lqmig3v5eb.execute-api.us-east-1.amazonaws.com/helix-services/version-picker-test/1.4.3" -H "x-owner: tripodsan" -H "x-repo: helix-pages-test" -H "x-ref: picker-unit-test"

but then, for the next invocation, there is a request: https://app.epsagon.com/spans/0eb941b1-efb8-bc9d-8825-37c8c4e9e3e9?tab=graph&start_time=1618850989

but for one later, no: https://app.epsagon.com/spans/0eb941d3-a84d-bce9-ec4d-868244f801ca?tab=graph&start_time=1618851210

could it be that epsagon doesn't catch requests on http2 keep alive connections?


then I did a test by creating a new fetch context for every invocation.... but same result. no requests recorded. and playing more with it, I get mixed results. sometimes after a deploy, the traces to github are there, sometimes they aren't. I think it's related to some timing / initialization order.


then I created a version with EPSAGON_DEBUG enabled. interrestingly, the traces show the request to github: https://app.epsagon.com/spans/0eb94369-4b0e-ddb4-1be6-03fa53c769e3?tab=graph&start_time=1618853869

but the cloudwatch logs don't mention it.

so, I have no clue so far, what could be the reason.


I also updated to node 14, but with the same result. freshly deployed, no traces: https://app.epsagon.com/spans/0eb9445d-8c93-62ed-ec2d-e0fed9418c92?tab=graph&start_time=1618855469

and then with enabled debug, also no capture: https://app.epsagon.com/spans/0eb94474-49dd-544d-6688-8c171f76f17d?tab=graph&start_time=1618855618

btw, I noticed those INFO in the debug output:

[EPSAGON] require.resolve.paths is not a function

might this be a problem?