Open gyss opened 2 months ago
Hiya, status update on where I've got to so far. This is hard! I'm encountering two issues:
I set up a test app locally which makes requests via both Undici and node-fetch – these are the two most common ways we make client requests from our apps. I can get the Undici request to log but I don't seem to be able to hook into the HTTP (node-fetch) request at all. How is this working?? @sjparkinson this is making me question everything - are we definitely getting metrics from node-fetch? 😂
Here's what I've been trying (I've also tried every other option under @opentelemetry/instrumentation-http
):
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': {
requestHook: (span, request) => {
console.log('HTTP REQUEST HOOK', request.url);
}
},
'@opentelemetry/instrumentation-undici': {
requestHook: (span, request) => {
console.log('UNDICI REQUEST HOOK', request.path);
}
}
// ...
})
When I make the following requests:
const nFetch = require('node-fetch');
const bulbasaur = await fetch('https://pokeapi.co/api/v2/pokemon/bulbasaur'); // native fetch
const squirtle = await nFetch('https://pokeapi.co/api/v2/pokemon/squirtle'); // node-fetch
I get only this log:
UNDICI REQUEST HOOK /api/v2/pokemon/bulbasaur
I might be missing something, maybe there's a different instrumentation library that handles HTTP client requests? Seems unlikely though.
Turns out that the instrumentations for Node.js are quite strict. Ignoring the HTTP issue I decided to try and add custom metrics to native fetches. Attributes would get added to the http.client.request.duration
metric.
If I try adding a custom attribute that isn't explicitly supported or even if I add the one experimental attribute (url.template
) then I see nothing. So adding the following:
getNodeAutoInstrumentations({
// ...
'@opentelemetry/instrumentation-undici': {
startSpanHook(request) {
return {
['url.template']: '/hello-world'
};
}
}
// ...
})
The metric that gets exported is like this:
{
// ...
attributes: [
{
'http.response.status_code': 200,
'http.request.method': 'GET',
'server.address': 'pokeapi.co',
'server.port': 443,
'url.scheme': 'https'
}
]
}
However if I reuse one of the metrics that's already exported then it works:
getNodeAutoInstrumentations({
// ...
'@opentelemetry/instrumentation-undici': {
startSpanHook(request) {
return {
['server.address']: 'why.wont.you.work'
};
}
}
// ...
})
This exports the metric:
{
// ...
attributes: [
{
'http.response.status_code': 200,
'http.request.method': 'GET',
'server.address': 'why.wont.you.work',
'server.port': 443,
'url.scheme': 'https'
}
]
}
Very frustrating. If this is a hard limitation at the moment then I'm not sure what we can do to get paths into our metrics.
If I try adding a custom attribute that isn't explicitly supported or even if I add the one experimental attribute (
url.template
) then I see nothing.
I wonder if that is true for attributes in the semantic conventions like peer.service
? Or just no new attributes to the map.
@sjparkinson this is making me question everything - are we definitely getting metrics from node-fetch? 😂
No idea! Though I have wondered if some systems are a bit light on client request metrics, worth a browse of https://grafana.ft.com/d/HzGduSwIz/node-js?orgId=1&var-workspace=I7o8Aa1Sz&var-job=next-myft-api&var-environment=production&var-cloud_provider=heroku&var-cloud_region=All&var-http_method=All&var-http_route=All&var-http_errors=All&viewPanel=13 for a system with well known dependencies.
But my understanding for node-fetch is that https://github.com/open-telemetry/opentelemetry-js/blob/main/experimental/packages/opentelemetry-instrumentation-http/src/http.ts is covering it, as it's a wrapper.
It's a bit light on detail but this issue suggests that load order might be relevant.
Some thoughts on this (assuming it is possible).
There's a helpful list of API endpoints under Tyk at https://apigateway.in.ft.com/check-apis, 179 of them today.
I think there are two options for adding the detail we're looking for:
Using a map of URLs to system codes, add a peer.service
attribute to requests
e.g. a client request to https://api.ft.com/enrichedcontent/...
would have peer.service
set to enriched-content-read-api
.
Update the net.peer.name
(at some point this becomes server.address
FYI) attribute to include more of the listen path
e.g. a client request to https://api.ft.com/content/a21abcbe-ae52-40c0-8e8d-858372065e2a
would have net.peer.name
updated to api.ft.com/content/{id}
.
I don't really have a strong list of pros and cons, but my gut feeling is 2 would be most intuitive, and perhaps a little easier to maintain (let's try not to reinvent the next-metrics list right?).
@sjparkinson I've opened a draft PR here which includes a bunch of manual edits and a test app to illustrate the issue: https://github.com/Financial-Times/dotcom-reliability-kit/pull/1202.
Still not getting anything from node-fetch, very interested if you can find a way!
Setting net.peer.name
doesn't seem to work
Still not getting anything from node-fetch, very interested if you can find a way!
Reading through how the SDK wraps and how node-fetch is written, I wonder if it's related to node-fetch using node:http
and node:https
as imports?
(I'm sort of seeing this as maybe a good opportunity on figuring out how to write an auto-instrumentation package.)
Or, reading https://github.com/open-telemetry/opentelemetry-js/blob/main/experimental/packages/opentelemetry-instrumentation/README.md#limitations, is it related to node-fetch using ESM imports?
possibly for the latest node-fetch but I'm explicitly using a v2
version that uses CommonJS 😄
Still not getting anything from node-fetch, very interested if you can find a way!
I am also finding the same with a vanilla express app. Traces are captured for the POST request by the @opentelemetry/instrumentation-net
package, but nothing from @opentelemetry/instrumentation-http
.
Not sure why though. Something to fix somewhere!
I have asked in the #otel-js CNCF channel, join link if you're interested. Can't say I always get support, but worth asking.
I have however been able to use node-fetch@v2 fine with the http example at https://github.com/open-telemetry/opentelemetry-js/tree/main/examples/http. Bit of a fiddle but I installed node-fetch in the client and was able to get the expected trace.
diff --git a/examples/http/client.js b/examples/http/client.js
index 168d43392..0cf4c675f 100644
--- a/examples/http/client.js
+++ b/examples/http/client.js
@@ -2,7 +2,7 @@
const api = require('@opentelemetry/api');
const tracer = require('./tracer')('example-http-client');
-const http = require('http');
+const fetch = require('node-fetch');
/** A function which makes requests and handles response. */
function makeRequest() {
@@ -10,18 +10,11 @@ function makeRequest() {
// the span, which is created to track work that happens outside of the
// request lifecycle entirely.
tracer.startActiveSpan('makeRequest', (span) => {
- http.get({
- host: 'localhost',
- port: 8080,
- path: '/helloworld',
- }, (response) => {
- const body = [];
- response.on('data', (chunk) => body.push(chunk));
- response.on('end', () => {
- console.log(body.toString());
+ fetch('http://localhost:8080/helloworld')
+ .then((response) => {
+ response.text();
span.end();
});
- });
});
// The process must live for at least the interval past any traces that
diff --git a/examples/http/package.json b/examples/http/package.json
index 64a05ed43..3d8638469 100644
--- a/examples/http/package.json
+++ b/examples/http/package.json
@@ -5,10 +5,10 @@
"description": "Example of HTTP integration with OpenTelemetry",
"main": "index.js",
"scripts": {
- "zipkin:server": "cross-env EXPORTER=zipkin node ./server.js",
- "zipkin:client": "cross-env EXPORTER=zipkin node ./client.js",
- "jaeger:server": "cross-env EXPORTER=jaeger node ./server.js",
- "jaeger:client": "cross-env EXPORTER=jaeger node ./client.js",
+ "zipkin:server": "EXPORTER=zipkin node ./server.js",
+ "zipkin:client": "EXPORTER=zipkin node ./client.js",
+ "jaeger:server": "EXPORTER=jaeger node ./server.js",
+ "jaeger:client": "EXPORTER=jaeger node ./client.js",
"align-api-deps": "node ../../scripts/align-api-deps.js"
},
"repository": {
@@ -37,7 +37,8 @@
"@opentelemetry/resources": "1.26.0",
"@opentelemetry/sdk-trace-base": "1.26.0",
"@opentelemetry/sdk-trace-node": "1.26.0",
- "@opentelemetry/semantic-conventions": "1.27.0"
+ "@opentelemetry/semantic-conventions": "1.27.0",
+ "node-fetch": "^2.7.0"
},
"homepage": "https://github.com/open-telemetry/opentelemetry-js/tree/main/examples/http",
"devDependencies": {
So it does seem like an issue with auto-instrimentation as the issue Jon linked to suggests?
"HTTP vs Undici" from my original comment is resolved in v2.0.11 of the package but we still need to work out how we want to augment the metrics with our paths. "Custom attributes on existing metrics" is still blocking and we'll either need to find a workaround or try to get the OpenTelemetry maintainers to support custom attributes.
We'd like to have added a hook in the Opentelemetry plugin that adds specificity to the
api.ft.com
net_peer_name
attribute. So we can select the metrics specific to an API that lives inside the API GatewayWhat problem does this feature solve?
Due to the ongoing Graphite migration, we need to replace some of the system's Graphite based alerts with Opentelemetry.
Some of these alerts do need to trigger when a specific service is not working as it should. For example, let's see this alert in
next-myft-api
Graphite health check
The metrics inside
next.heroku.myft-api.*.fetch.insights-api-topic-recommendations
correspond to the requests done to the service in this URLhttps://api.ft.com/snr/v1/insights/topic-recommendations/
, which lives inside of the API Gateway. But when creating the Opentelemetry Grafana panel for this alert, we find thatnet_peer_name
doesn't allow us to selecttopic-recommendations
, onlyapi.ft.com
because this label is domain based.Link to Grafana panel
This means that if
next-myft-api
uses many services that are inside API Gateway (which it does), all the metrics are bundled together and we can't tell where the 5xx or 4xx are coming fromIdeal solution
As suggested by Sam Parkinson here, we could add a hook in the Opentelemetry plugin to add the first part of the path to the
net_peer_name
attribute. In this case it would beapi.ft.com/snr
, which works for us becausenext-myft-api
doesn't use any other API fromsnr
. But maybe we could consider adding the last part of the path, eg :api.ft.com/topic-recommendations
Anyways, this is a problem shared in many repositories (and probably teams), and it'd be amazing to have this solved in the Opentelemetry plugin
Alternatives
An alternative is to code this OTel hook in every system that needs this granularity.