Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.08k stars 1.2k forks source link

ENOTFOUND westeurope-5.in.applicationinsights.azure.com for statbeats feature #31000

Open vandung3101 opened 1 month ago

vandung3101 commented 1 month ago

Describe the bug We have a setup to install and instrument our application running on function app, telemetry was send to our application insight with Azure Monitor Private Link Scope (AMPLS) As we saw from https://learn.microsoft.com/en-us/azure/azure-monitor/app/statsbeat?tabs=eu-java%2Cnode, there are feature that send data to Microsoft own app insight. We found it seem to be this one https://github.com/Azure/azure-sdk-for-js/blob/ecff408fad7001bba7dc8612eb70435ac04f51de/sdk/monitor/monitor-opentelemetry-exporter/src/export/statsbeat/types.ts#L79

Even after we open firewall, the error still remains in the traces

{"stack":"Error: PeriodicExportingMetricReader: metrics export failed (error RestError: getaddrinfo ENOTFOUND westeurope-5.in.applicationinsights.azure.com)\n    at doExport (/home/site/wwwroot/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76:23)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async PeriodicExportingMetricReader._doRun (/home/site/wwwroot/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:84:13)\n    at async PeriodicExportingMetricReader._runOnce (/home/site/wwwroot/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:55:13)","message":"PeriodicExportingMetricReader: metrics export failed (error RestError: getaddrinfo ENOTFOUND westeurope-5.in.applicationinsights.azure.com)","name":"Error"} []

and another one

Export took longer than [ 30000 ] milliseconds and timed out.

To Reproduce Steps to reproduce the behavior:

  1. Azure setup with nodejs function app, application insight, hub and spoke network, azure monitor private link scope, firewall deny by default
  2. Open firewall for westeurope-5.in.applicationinsights.azure.com
  3. Instrument application with @azure/monitor-opentelemetry
  4. Check the traces in application insight

Expected behavior A way to disable statbeats or fix the error

github-actions[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @AzmonActionG @AzmonAlerts @AzMonEssential @AzmonLogA @dadunl @SameergMS.

JacksonWeber commented 1 month ago

@vandung3101 If you're experiencing issues with statsbeat sending to the ingestion endpoint, please use the APPLICATION_INSIGHTS_NO_STATSBEAT environment variable to disable statsbeat.

Ronbabious commented 1 month ago

@vandung3101 If you're experiencing issues with statsbeat sending to the ingestion endpoint, please use the APPLICATION_INSIGHTS_NO_STATSBEAT environment variable to disable statsbeat.

@JacksonWeber - We will try this. Thank you.

I'm confused as to why this isn't documented here? https://learn.microsoft.com/en-us/azure/azure-monitor/app/statsbeat?tabs=eu-java%2Cnode#configure-statsbeat

JacksonWeber commented 1 month ago

@Ronbabious Have you also been encountering issues with the statsbeat feature? Would you mind letting us know what environment you're running in?

As for the documentation, I'll make sure that gets updated. Thank you for the call out.

Ronbabious commented 1 month ago

@JacksonWeber - We've only encountered these issues with the azure-sdk-for-js. As mentioned it looks like the SDK is sending a statbeat to an Application Insight instance hosted in westeurope and that's what was causing issues. We have similar implementations for both .NET and Java without any issues.

We are running in a private VNET, Hub and Spoke type infrastructure hosted in North Europe. All external traffic needs to be whitelisted. Even after whitelisting the Ingestion endpoint westeurope-5.in.applicationinsights.azure.com did not resolve the issue. I suspect that it might have something to do with the Azure network backbone, as discussed here How to fix getaddrinfo ENOTFOUND DNS issue on Node.js Azure Functions App??

Not entirely sure what causes the issue, but disabling the STATBEAT resolved it for us, and we are going with that work-around.