apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.27k stars 3.59k forks source link

FunctionsStatsGenerator - Failed to collect metrics #10459

Open devinbost opened 3 years ago

devinbost commented 3 years ago

While doing some testing on master (2.8.0-SNAPSHOT), this message started flooding my broker logs as soon as I started the brokers:

Apr 30 21:25:40 fab08 77c203532c67: 2021-05-01T03:25:40,933 [prometheus-stats-47-1] WARN  org.apache.pulsar.functions.worker.FunctionsStatsGenerator - Failed to collect metrics for function instance myTenant/myNamespace/function-router:0
Apr 30 21:25:40 fab08 77c203532c67: java.net.ConnectException: Connection refused (Connection refused)
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.Socket.connect(Socket.java:607) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.net.Socket.connect(Socket.java:556) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.NetworkClient.doConnect(NetworkClient.java:180) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.http.HttpClient.New(HttpClient.java:339) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.http.HttpClient.New(HttpClient.java:357) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1570) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) ~[?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.pulsar.functions.runtime.RuntimeUtils.getPrometheusMetrics(RuntimeUtils.java:447) ~[org.apache.pulsar-pulsar-functions-runtime-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.pulsar.functions.runtime.process.ProcessRuntime.getPrometheusMetrics(ProcessRuntime.java:321) ~[org.apache.pulsar-pulsar-functions-runtime-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.pulsar.functions.worker.FunctionsStatsGenerator.generate(FunctionsStatsGenerator.java:71) ~[org.apache.pulsar-pulsar-functions-worker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.pulsar.functions.worker.PulsarWorkerService.generateFunctionsStats(PulsarWorkerService.java:171) ~[org.apache.pulsar-pulsar-functions-worker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:106) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:72) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
Apr 30 21:25:40 fab08 77c203532c67: #011at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
Apr 30 21:25:40 fab08 77c203532c67: #011at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
Apr 30 21:25:40 fab08 77c203532c67: #011at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]

That function was created in a previous version (Pulsar 2.4.0), and the brokers were upgraded to master.

devinbost commented 3 years ago

This exception does not show up in the Pulsar build from 2-3 days ago that was used as mentioned in this issue: https://github.com/apache/pulsar/issues/10460 So, the issues could be related.

devinbost commented 3 years ago

I bet that in this issue if functions aren't starting correctly (or are restarting), then FunctionsStatsGenerator can't connect or keeps getting disconnected due to the function restarting. So, I think this issue is a symptom of #10460 .

codelipenghui commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.