elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

Recovered from panic while fetching sql/query #31577

Closed IanMoroney closed 1 year ago

IanMoroney commented 2 years ago

When using the metricbeat sql module against Azure SQL using sql authentication, I am able to get it to push an event into elasticsearch, and then the driver seems to panic. No further events are pushed. If I restart the container, again the first event makes it into elastic, then metricbeat panics again and no more events go through.

panic details:

2022-05-10T15:03:06.021Z    ERROR   go-mssqldb@v0.0.0-20200206145737-bbfc9a55622e/net.go:167    recovered from panic while fetching 'sql/query' for host 'mydatabasename.database.windows.net:1433'. Recovering, but please report this.    {"panic": "Not implemented", "stack": "github.com/elastic/beats/v7/libbeat/logp.Recover\n\t/go/src/github.com/elastic/beats/libbeat/logp/global.go:102\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:1038\ngithub.com/denisenkom/go-mssqldb.passthroughConn.SetWriteDeadline\n\t/go/pkg/mod/github.com/denisenkom/go-mssqldb@v0.0.0-20200206145737-bbfc9a55622e/net.go:167\ncrypto/tls.(*Conn).SetWriteDeadline\n\t/usr/local/go/src/crypto/tls/conn.go:151\ncrypto/tls.(*Conn).closeNotify\n\t/usr/local/go/src/crypto/tls/conn.go:1361\ncrypto/tls.(*Conn).Close\n\t/usr/local/go/src/crypto/tls/conn.go:1331\ngithub.com/denisenkom/go-mssqldb.(*Conn).Close\n\t/go/pkg/mod/github.com/denisenkom/go-mssqldb@v0.0.0-20200206145737-bbfc9a55622e/mssql.go:361\ndatabase/sql.(*driverConn).finalClose.func2\n\t/usr/local/go/src/database/sql/sql.go:646\ndatabase/sql.withLock\n\t/usr/local/go/src/database/sql/sql.go:3396\ndatabase/sql.(*driverConn).finalClose\n\t/usr/local/go/src/database/sql/sql.go:644\ndatabase/sql.(*DB).Close\n\t/usr/local/go/src/database/sql/sql.go:904\ngithub.com/elastic/beats/v7/x-pack/metricbeat/module/sql/query.(*MetricSet).Fetch\n\t/go/src/github.com/elastic/beats/x-pack/metricbeat/module/sql/query/query.go:98\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).fetch\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:263\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).startPeriodicFetching\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:224\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).run\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:208\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*Wrapper).Start.func1\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:147"}

Configuration:

- module: sql
      metricsets:
        - query
      period: 30s
      hosts: ["sqlserver://${SQL_USERNAME}:${SQL_PASSWORD}@${SQL_HOST}:1433?database=mydatabasename"]
      driver: "mssql"
      sql_query: 'SELECT count (*) AS count FROM schema.tablename where columnname = 0'
      sql_response_format: table

In Kibana, I see a single event that shows sql.query sql.driver sql.metrics.numeric.count etc, so the data comes through just fine, if it weren't for the panic shutting down further events from coming through.

Note, the reason for using ?database=mydatabasename as part of the connection string, is because of connecting to Azure SQL.

IanMoroney commented 2 years ago

@kaiyan-sheng, there is already a discussion open on the forum, but this is a bug in the code.

AliakseiKiryanau1 commented 2 years ago

Hi! We have similar issue on one of our environments, but it's not related to specifying DB name in connection string. But we're getting the same exceptions in metricbeat logs as reported in this issue:

2022-08-04T14:31:57.842Z    ERROR   go-mssqldb@v0.0.0-20200206145737-bbfc9a55622e/net.go:167    recovered from panic while fetching 'sql/query' for host 'agln_13475.WFLAB.IO:1433'. Recovering, but please report this.    {"panic": "Not implemented", "stack": "github.com/elastic/beats/v7/libbeat/logp.Recover\n\t/go/src/github.com/elastic/beats/libbeat/logp/global.go:102\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:1038\ngithub.com/denisenkom/go-mssqldb.passthroughConn.SetWriteDeadline\n\t/go/pkg/mod/github.com/denisenkom/go-mssqldb@v0.0.0-20200206145737-bbfc9a55622e/net.go:167\ncrypto/tls.(*Conn).SetWriteDeadline\n\t/usr/local/go/src/crypto/tls/conn.go:151\ncrypto/tls.(*Conn).closeNotify\n\t/usr/local/go/src/crypto/tls/conn.go:1361\ncrypto/tls.(*Conn).Close\n\t/usr/local/go/src/crypto/tls/conn.go:1331\ngithub.com/denisenkom/go-mssqldb.(*Conn).Close\n\t/go/pkg/mod/github.com/denisenkom/go-mssqldb@v0.0.0-20200206145737-bbfc9a55622e/mssql.go:361\ndatabase/sql.(*driverConn).finalClose.func2\n\t/usr/local/go/src/database/sql/sql.go:646\ndatabase/sql.withLock\n\t/usr/local/go/src/database/sql/sql.go:3396\ndatabase/sql.(*driverConn).finalClose\n\t/usr/local/go/src/database/sql/sql.go:644\ndatabase/sql.(*DB).Close\n\t/usr/local/go/src/database/sql/sql.go:904\ngithub.com/elastic/beats/v7/x-pack/metricbeat/module/sql/query.(*MetricSet).Fetch\n\t/go/src/github.com/elastic/beats/x-pack/metricbeat/module/sql/query/query.go:98\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).fetch\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:263\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).startPeriodicFetching\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:224\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).run\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:208\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*Wrapper).Start.func1\n\t/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:147"}

Example query from metricbeat config:

- module: sql
  metricsets:
    - query
  period: 300s
  hosts: ["sqlserver://${MSSQL_PM_USER}:${MSSQL_PM_PASS}@${MSSQL_HOST}:${MSSQL_PORT}"]
  driver: "mssql"
  sql_response_format: table
  sql_query: "select
    name as mssql_config_name,
    cast(value as int) as mssql_config_value
  from sys.database_scoped_configurations;"

Could you advise where to look at? The weird thing is that everything working without issues on many other our environments which have similar setup. ELK version used: 7.16.3 OS version - Centos 7. Also tried to update metricbeat to 7.17.5 but it didn't help.

Could you advise where to look at. The behaviour is the same as was reported initially. First attempt to run query is executed successfully, and data appears in ELK index. But further executions are not happening after "panic" from go-mssqldb driver.

IanMoroney commented 2 years ago

@AliakseiKiryanau1, until this issue is solved i've been forced to use a logstash configuration which actually works consistently.

If you need the code, let me know and i'll paste the yaml for it 👍 Beats still doesn't work for me, and continues to crash.

AliakseiKiryanau1 commented 2 years ago

@IanMoroney, yeah logstash pipelines are working without any issues using jdbc input and elasticsearch output.. Still we don't want to overcomplicated logstash configuration as it's already pretty complex as we have dozens of logstash pipelines including jdbc inputs. The bad thing here is that we have over 30 monitoring queries, and moving them to logstash engine would introduce significant load on logstash.

For now we've opened support case in elasticsearch, we had small discussion with ELK team, they've escalated this issue, so hopefully it will be resolved soon.

IanMoroney commented 2 years ago

exactly yep.

IanMoroney commented 2 years ago

Any updates from that elastic ticket @AliakseiKiryanau1 ?

vedard commented 1 year ago

Hi,

I'm still having this issue with Metricbeat 8.8.2:

{
    "log.level": "error",
    "@timestamp": "2023-07-05T11:18:38.069-0400",
    "log.origin": {
        "file.name": "go-mssqldb@v0.9.0/net.go",
        "file.line": 167
    },
    "message": "recovered from panic while fetching 'sql/query' for host 'XYZ'. Recovering, but please report this.",
    "service.name": "metricbeat",
    "panic": "Not implemented",
    "stack": "github.com/elastic/elastic-agent-libs/logp.Recover\n\tgithub.com/elastic/elastic-agent-libs@v0.3.3/logp/global.go:102\nruntime.gopanic\n\truntime/panic.go:884\ngithub.com/denisenkom/go-mssqldb.passthroughConn.SetWriteDeadline\n\tgithub.com/denisenkom/go-mssqldb@v0.9.0/net.go:167\ncrypto/tls.(*Conn).SetWriteDeadline\n\tcrypto/tls/conn.go:152\ncrypto/tls.(*Conn).closeNotify\n\tcrypto/tls/conn.go:1399\ncrypto/tls.(*Conn).Close\n\tcrypto/tls/conn.go:1369\ngithub.com/denisenkom/go-mssqldb.(*Conn).Close\n\tgithub.com/denisenkom/go-mssqldb@v0.9.0/mssql.go:361\ndatabase/sql.(*driverConn).finalClose.func2\n\tdatabase/sql/sql.go:645\ndatabase/sql.withLock\n\tdatabase/sql/sql.go:3439\ndatabase/sql.(*driverConn).finalClose\n\tdatabase/sql/sql.go:643\ndatabase/sql.(*DB).Close\n\tdatabase/sql/sql.go:903\ngithub.com/elastic/beats/v7/x-pack/metricbeat/module/sql/query.(*MetricSet).Fetch\n\tgithub.com/elastic/beats/v7/x-pack/metricbeat/module/sql/query/query.go:179\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).fetch\n\tgithub.com/elastic/beats/v7/metricbeat/mb/module/wrapper.go:260\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).startPeriodicFetching\n\tgithub.com/elastic/beats/v7/metricbeat/mb/module/wrapper.go:225\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).run\n\tgithub.com/elastic/beats/v7/metricbeat/mb/module/wrapper.go:209\ngithub.com/elastic/beats/v7/metricbeat/mb/module.(*Wrapper).Start.func1\n\tgithub.com/elastic/beats/v7/metricbeat/mb/module/wrapper.go:149",
    "ecs.version": "1.6.0"
}

It looks like metricbeat is still using an old version of jmoiron/sqlx that is using the version 9.0.0 of go-mssqldb : https://github.com/elastic/beats/blob/8597ca78aba6742961d4e235a565d87b7d21fc67/go.mod#L114

The bug seems to have been fixed in the underlying library go-mssqldb: https://github.com/denisenkom/go-mssqldb/issues/639#issuecomment-817335787

So I believe that updating jmoiron/sqlx should fix the issue!

shmsr commented 1 year ago

The related package was upgraded in this PR: https://github.com/elastic/beats/pull/35289 and the fix is available beats@v8.9.0+

Let us know if the fix addresses the issue. Meanwhile, I will close this issue as the fix is already released.

vedard commented 1 year ago

Hello @shmsr,

I can confirm the issue is now fixed in 8.9.0, metricbeat can now correctly connect to a Microsoft SQL database and send results to Elasticsearch.

Thank you!