[Reporting] Unable to generate csv report, Max attempts reached (3)

mateskoo commented 2 years ago

Kibana version: 7.12.1

Elasticsearch version: 7.12.1

Server OS version: Oracle Linux Server 7.9

Browser version: Google Chrome 94.0.4606.71 (Official Build) (64-bit)

Browser OS version: Windows 10 Enterprise 1909

Original install method (e.g. download page, yum, from source, etc.): rpm installation for elastic and kibana

Describe the bug: CSV Report failing to create from saved searches. Error: Unable to generate report, Max attempts reached (3)

Expected behavior: CSV Report creating from the targeted data.

Screenshots (if relevant):

Provide logs and/or server output (if relevant): Error: Unable to generate report, Max attempts reached (3)

Any additional context: When trying to create a csv from recent data (last 15 minutues, last 1 hour), the report completes. When trying everything above 1 hour the error occurs. 15 mins = approx 22,906 hits 1hour = approx 86,533 hits

Changes to kibana.yml (based what I've found online) that didn't do much:

xpack.reporting.queue.timeout: 900000
xpack.reporting.kibanaServer.port: 80
xpack.reporting.kibanaServer.protocol: http
xpack.reporting.kibanaServer.hostname: elastic-server
xpack.reporting.csv.maxSizeBytes: 104857599

xpack.reporting.capture.browser.type: chromium
xpack.reporting.capture.browser.chromium.disableSandbox: true

elasticmachine commented 2 years ago

Pinging @elastic/kibana-reporting-services (Team:Reporting Services)

Dosant commented 2 years ago

Hi @mateskoo, Are there any more details in Kibana server logs?

jloleysens commented 2 years ago

It will be a bit tricky to say without seeing the server logs for the exact error message. One thing I can think to check is setting max size bytes to 100+MB could be causing issues. What happens when it is left as default?

[EDIT]

Default being 10mb. If that is too small, perhaps slowly increasing size.

elasticmachine commented 2 years ago

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

tsullivan commented 2 years ago

Another possibility is the larger export size is too large for Elasticsearch to index. Check to see if the ES setting http.max_content_length is large enough for ES to accept the data coming from Kibana Reporting.

https://www.elastic.co/guide/en/elasticsearch/reference/7.12/modules-network.html

mateskoo commented 2 years ago

Hello all,

I've tried to increase the limit in kibana.yml, to 200mb, but the get report generating is still failing with the exact same error.

There is nothing in the logs related to that error - I've enabled quiet loggin in kibana to display only errors and reproducted the issue. I will look into elasticsearch logs also.

@tsullivan Thank you for the hint, will try to play with http.max_content_length and let you know.

mateskoo commented 2 years ago

Hi all,

I was trying to raise the http.max_content_length to 500mb. Also I've been playing with the timeouts in kibana conf. Unfortunatelly, the problem still persists, so it didn't have a big effect.

I also checked the logs, which took some time to analyze and when I enabled the debug option in Kibana I've seen this:

{"type":"log","@timestamp":"2022-02-08T18:01:13+01:00","tags":["debug","plugins","security","basic","basic"],"pid":30752,"message":"Trying to authenticate user request to /api/reporting/jobs/count."}
{"type":"log","@timestamp":"2022-02-08T18:01:13+01:00","tags":["debug","plugins","security","basic","basic"],"pid":30752,"message":"Trying to authenticate via state."}
{"type":"log","@timestamp":"2022-02-08T18:01:13+01:00","tags":["debug","elasticsearch","query","data"],"pid":30752,"message":"200\nGET /_security/_authenticate"}
{"type":"log","@timestamp":"2022-02-08T18:01:13+01:00","tags":["debug","plugins","security","basic","basic"],"pid":30752,"message":"Request has been authenticated via state."}
{"type":"log","@timestamp":"2022-02-08T18:01:13+01:00","tags":["debug","elasticsearch","query","data"],"pid":30752,"message":"200\nPOST /.reporting-*/_count\n{\"query\":{\"constant_score\":{\"filter\":{\"bool\":{\"must\":[{\"terms\":{\"jobtype\":[\"csv\",\"csv_from_savedobject\"]}},{\"term\":{\"created_by\":\"elastic\"}}]}}}}}"}

{"type":"log","@timestamp":"2022-02-08T18:01:05+01:00","tags":["debug","http","server","response"],"pid":30752,"ecs":{"version":"1.7.0"},"client":{"ip":"170.110.215.2"},"http":{"request":{"method":"GET","mime_type":null,"referrer":"http://elk-server:9660/app/discover","headers":{"host":"elk-server:9660","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0","accept":"*/*","accept-language":"en-US,en;q=0.5","accept-encoding":"gzip, deflate","referer":"http://elk-server:9660/app/discover","content-type":"application/json","kbn-version":"7.12.1","connection":"keep-alive"}},"response":{"body":{"bytes":4300},"status_code":200,"headers":{"content-type":"application/json; charset=utf-8","kbn-name":"SAM-Kibana","kbn-license-sig":"449034386b26badde35383d6450f3dbbbac7a38a3a0373c18a9a89f26f739556","cache-control":"private, no-cache, no-store, must-revalidate","vary":"accept-encoding","content-encoding":"gzip"},"responseTime":43}},"url":{"path":"/api/reporting/jobs/list","query":"page=0&ids=kzed5vhr0nq8ba6c7d09058s"},"user_agent":{"original":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"},"message":"GET /api/reporting/jobs/list?page=0&ids=kzed5vhr0nq8ba6c7d09058s 200 43ms - 4.2KB"}
{"type":"log","@timestamp":"2022-02-08T18:01:06+01:00","tags":["warning","plugins","reporting","esqueue","queue-worker"],"pid":30752,"message":"kzed2wwd0nq8ba6c7dedr8n7 - Timeout processing job kzed5vhr0nq8ba6c7d09058s"}
{"type":"log","@timestamp":"2022-02-08T18:01:06+01:00","tags":["warning","plugins","reporting","esqueue","queue-worker"],"pid":30752,"message":"kzed2wwd0nq8ba6c7dedr8n7 - Timeout on job kzed5vhr0nq8ba6c7d09058s"}

{"type":"log","@timestamp":"2022-02-08T18:01:08+01:00","tags":["debug","http","server","response"],"pid":30752,"ecs":{"version":"1.7.0"},"client":{"ip":"170.110.215.2"},"http":{"request":{"method":"GET","mime_type":null,"referrer":"http://elk-server:9660/app/management/insightsAndAlerting/reporting","headers":{"host":"elk-server:9660","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0","accept":"*/*","accept-language":"en-US,en;q=0.5","accept-encoding":"gzip, deflate","referer":"http://elk-server:9660/app/management/insightsAndAlerting/reporting","content-type":"application/json","kbn-version":"7.12.1","kbn-system-request":"true","connection":"keep-alive"}},"response":{"body":{"bytes":2},"status_code":200,"headers":{"content-type":"text/plain; charset=utf-8","kbn-name":"SAM-Kibana","kbn-license-sig":"449034386b26badde35383d6450f3dbbbac7a38a3a0373c18a9a89f26f739556","cache-control":"private, no-cache, no-store, must-revalidate","content-length":2,"accept-ranges":"bytes"},"responseTime":29}},"url":{"path":"/api/reporting/jobs/count","query":""},"user_agent":{"original":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"},"message":"GET /api/reporting/jobs/count 200 29ms - 2.0B"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","elasticsearch","query","data"],"pid":30752,"message":"200\nPOST /.reporting-*/_search\n{\"seq_no_primary_term\":true,\"_source\":{\"excludes\":[\"output.content\"]},\"query\":{\"bool\":{\"filter\":{\"bool\":{\"minimum_should_match\":1,\"should\":[{\"term\":{\"status\":\"pending\"}},{\"bool\":{\"must\":[{\"term\":{\"status\":\"processing\"}},{\"range\":{\"process_expiration\":{\"lte\":\"2022-02-08T17:01:09.468Z\"}}}]}}]}}}},\"sort\":[{\"priority\":{\"order\":\"asc\"}},{\"created_at\":{\"order\":\"asc\"}}],\"size\":1}"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","plugins","reporting","esqueue","queue-worker"],"pid":30752,"message":"kzed2wwd0nq8ba6c7dedr8n7 - 1 outstanding jobs returned"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["warning","plugins","reporting","esqueue","queue-worker"],"pid":30752,"message":"kzed2wwd0nq8ba6c7dedr8n7 - Failing job kzed5vhr0nq8ba6c7d09058s"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","elasticsearch","query","data"],"pid":30752,"message":"200\nPOST /.reporting-2022-02-06/_update/kzed5vhr0nq8ba6c7d09058s?if_seq_no=13&if_primary_term=1\n{\"doc\":{\"status\":\"failed\",\"completed_at\":\"2022-02-08T17:01:09.477Z\",\"output\":{\"content\":\"Max attempts reached (3)\",\"content_type\":false}}}"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["info","plugins","reporting","esqueue","queue-worker"],"pid":30752,"message":"kzed2wwd0nq8ba6c7dedr8n7 - Job marked as failed: /.reporting-2022-02-06/_doc/kzed5vhr0nq8ba6c7d09058s"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","plugins","reporting","esqueue","queue-worker"],"pid":30752,"message":"kzed2wwd0nq8ba6c7dedr8n7 - Found no claimable jobs out of 1 total"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","elasticsearch","query","data"],"pid":30752,"message":"200\nGET /.kibana_7.12.1/_doc/config%3A7.12.1"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","plugins","monitoring","monitoring","kibana-monitoring"],"pid":30752,"message":"not sending [kibana_settings] monitoring document because [undefined] is null or invalid."}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","plugins","monitoring","monitoring","kibana-monitoring"],"pid":30752,"message":"Uploading bulk stats payload to the local cluster"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","plugins","upgradeAssistant","reindex_worker"],"pid":30752,"message":"Polling for reindex operations"}
{"type":"log","@timestamp":"2022-02-08T18:01:09+01:00","tags":["debug","elasticsearch","query","data"],"pid":30752,"message":"200\nPOST /_monitoring/bulk?system_id=kibana&system_api_version=7&interval=10000ms\n{\"index\":{\"_type\":\"kibana_stats\"}}\n{\"kibana\":{\"uuid\":\"2486d811-9b86-480e-89b9-591baeff2daa\",\"name\":\"SAM-Kibana\",\"index\":\".kibana\",\"host\":\"6.151.106.223\",\"transport_address\":\"elk-server:9660\",\"version\":\"7.12.1\",\"snapshot\":false,\"status\":\"green\"},\"os\":{\"platform\":\"linux\",\"platformRelease\":\"linux-3.10.0-1160.15.2.el7.x86_64\",\"load\":{\"1m\":0.16,\"5m\":0.19,\"15m\":0.51},\"memory\":{\"total_in_bytes\":25012850688,\"free_in_bytes\":5252194304,\"used_in_bytes\":19760656384},\"uptime_in_millis\":10301750000,\"distro\":\"Red Hat Linux\",\"distroRelease\":\"Red Hat Linux-7.9\",\"cpuacct\":{\"control_group\":\"/system.slice/kibana.service\",\"usage_nanos\":53972501309},\"cpu\":{\"control_group\":\"/system.slice/kibana.service\",\"cfs_period_micros\":100000,\"cfs_quota_micros\":-1,\"stat\":{\"number_of_elapsed_periods\":0,\"number_of_times_throttled\":0,\"time_throttled_nanos\":0}}},\"response_times\":{\"average\":114.66666666666667,\"max\":271},\"concurrent_connections\":2,\"process\":{\"memory\":{\"heap\":{\"total_in_bytes\":224079872,\"used_in_bytes\":189593440,\"size_limit\":4345298944},\"resident_set_size_in_bytes\":275841024},\"event_loop_delay\":0.9637470245361328,\"uptime_in_millis\":515693.215704},\"requests\":{\"disconnects\":0,\"total\":3},\"timestamp\":\"2022-02-08T17:01:07.066Z\"}\n{\"index\":{\"_type\":\"kibana_settings\"}}\n{\"kibana\":{\"uuid\":\"2486d811-9b86-480e-89b9-591baeff2daa\",\"name\":\"SAM-Kibana\",\"index\":\".kibana\",\"host\":\"6.151.106.223\",\"locale\":\"en\",\"port\":\"9661\",\"transport_address\":\"elk-server:9660\",\"version\":\"7.12.1\",\"snapshot\":false,\"status\":\"green\"}}\n"}

There are no errors in the logs after the job fails. Also, I've considerably raised the timeout in Kibana, which resulted into waiting for an hour for the job to fail.

tsullivan commented 2 years ago

Hi, are there multiple instances of Kibana in the cluster? Do they all have the same configuration?
Is there a way you could share the configuration (with any sensitive information redacted)?
Is Kibana staying online throughout the entire report job, or is it restarting / crashing during the job execution?

The log messages you provided show what happens after a timeout occurs, but we're not seeing any log messages from the start of the job execution.

mateskoo commented 2 years ago

Hi @tsullivan ,

No, just one Kibana instance is running (v7.12.1)
Kibana config:

server.host: IP

server.port: PORT

elasticsearch.hosts:
  - https://elk1:8080
  - https://elk2:8080
  - https://elk3:8080

server.name: "M-Kibana"

elasticsearch.username: "kibana"
elasticsearch.password: "password"

elasticsearch.ssl.verificationMode: none

logging.dest: /tmp/kibana.log

logging.quiet: true

telemetry.enabled: false

xpack.reporting.queue.timeout: 9999999
xpack.reporting.kibanaServer.port: port
xpack.reporting.kibanaServer.protocol: http
xpack.reporting.kibanaServer.hostname: elk1
xpack.reporting.csv.maxSizeBytes: 838860800

xpack.reporting.capture.browser.type: chromium
xpack.reporting.capture.browser.chromium.disableSandbox: true

Kibana is staying online the whole time, no restarting, no crashing.

Full log file here: kibana_grep.log

tsullivan commented 2 years ago

Hi @mateskoo thank you for the full debug logs: that really helped. It showed that report execution is active for 2 minutes and then stops because of a logged timeout.

The logs you posted show a job that started at 2022-02-08T17:55:00+01:00 and timed out on 2022-02-08T17:57:00+01:00, but right up until the timeout, search results were still coming in (executing scroll request). The remedy is to increase the xpack.reporting.queue.timeout value (default is 2 minutes) and the config that you posted 12 days ago shows that you have increased it to 2 hours and 46 minutes.

In my analysis, it looks like the config doesn't match what the logs are saying. Are you still having an issue with CSV jobs timing out?

NOTE: If you had pending jobs waiting to be executed before changing the config setting for the timeout, those pending jobs may have had the original timeout coded into the report job parameters. So they could have been using the old 2 minute timeout, and only new jobs created after the config change would have the new timeout.

mateskoo commented 2 years ago

Hello @tsullivan. Thank you for your time going through the logs :)

Yes, I did it in a confusing matter - I've uploaded logs where the xpack.reporting.queue.timeout is default (2 mins), as you stated. The reason is simple, with verbose logging in Kibana enabled and letting it run for 2 hours, we would have thousands of lines of logs, which would be not possible to read in a reasonable time.

I've been playing with the xpack.reporting.queue.timeout, setting it to 2hours and 46 minutes, but the result was the same. Still the reports were not generating.

Do you want the logs when the xpack.reporting.queue.timeout is set to 2 hours and 46 minutes? I can do it but it will be a pain to analyze.

tsullivan commented 2 years ago

Hi, since you are using 7.12.1, getting all the data you need might not be possible. In this version, export works by storing the entire contents of the CSV in Kibana's memory and then uploads it to Elasticsearch as a field of a single document. This technique has a lot of limitation because filling up memory with CSV contents can make Kibana stop performing well. It can cause it's own health checks to stop working and network requests to get dropped.

Do you want the logs when the xpack.reporting.queue.timeout is set to 2 hours and 46 minutes? I can do it but it will be a pain to analyze.

The question is, what is Kibana doing right before the timeout happens? Is it still accumulating search data? Has Kibana become unstable? Are the events in the logs getting longer and longer apart? When this starts being the case, adding more timeout time usually won't help, because processing has stopped being productive.

The unfortunate solution is to export less data with each export. Some customers have addressed this by changing the time filters by reducing the selection by half and making 2 exports instead of 1. I think many other customers have implemented their own CSV export using scripts that just query directly to Elasticsearch and don't involve Kibana.

In 7.16, we added enhancements that split up the CSV contents into multiple documents, which made the feature much more reliable.

mateskoo commented 2 years ago

Thank you for the reply.

Kibana is not freezing or misbehaving during the csv extraction. As you mentioned the kibana built in CSV exporter is not reliable. Setting up a python script which extracts the data through a API is much better.

I may try if the default Kibana exporter will work better on a higher version of elastic.

Thanks for the support @tsullivan

tsullivan commented 2 years ago

Closing this as the issue was filed against 7.12 and the implementation has changed drastically since 7.16.

elastic / kibana

[Reporting] Unable to generate csv report, Max attempts reached (3) #124024