Open dominiqueclarke opened 2 years ago
I don't think that this is elastic-package's or Elasticsearch's error. It looks like the host's disk is full. I used to observe those whenever I don't clean my Docker images. Have you checked disk capacity?
these are Ubuntu 18 agents, checking the daily test we make on those agents they have about 22GB of free space before running anything
[2022-05-09T05:04:07.097Z] Filesystem Size Used Avail Use% Mounted on
[2022-05-09T05:04:07.097Z] udev 7.4G 0 7.4G 0% /dev
[2022-05-09T05:04:07.097Z] tmpfs 1.5G 900K 1.5G 1% /run
[2022-05-09T05:04:07.097Z] /dev/sda1 146G 125G 22G 86% /
[2022-05-09T05:04:07.097Z] tmpfs 7.4G 0 7.4G 0% /dev/shm
[2022-05-09T05:04:07.097Z] tmpfs 5.0M 0 5.0M 0% /run/lock
[2022-05-09T05:04:07.097Z] tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup
[2022-05-09T05:04:07.098Z] /dev/loop0 295M 295M 0 100% /snap/google-cloud-sdk/239
[2022-05-09T05:04:07.098Z] /dev/loop1 56M 56M 0 100% /snap/core18/2344
[2022-05-09T05:04:07.098Z] /dev/loop2 45M 45M 0 100% /snap/snapd/15534
[2022-05-09T05:04:07.098Z] /dev/sda15 105M 4.4M 100M 5% /boot/efi
@dominiqueclarke You may want to compare those stats with the moment when elastic-package stack fails.
@mtojek @kuisathaverat
So the error reported today is actually https://github.com/elastic/uptime-dev/issues/99. Nothing on my end has changed with regard to the error reported in the linked issue versus the error reported in this issue. Was able to reproduce this error on Friday and now only able to reproduce https://github.com/elastic/uptime-dev/issues/99 with the same script.
Disk usage reported after the Kibana container reported unhealthy status
Test run: https://apm-ci.elastic.co/job/apm-agent-rum/job/e2e-synthetics-mbp/view/change-requests/job/PR-499/28/console (You can search for Disk usage...
)
14:52:20 Filesystem Size Used Avail Use% Mounted on
14:52:20 udev 7.4G 0 7.4G 0% /dev
14:52:20 tmpfs 1.5G 1.1M 1.5G 1% /run
14:52:20 /dev/sda1 146G 132G 15G 91% /
14:52:20 tmpfs 7.4G 0 7.4G 0% /dev/shm
14:52:20 tmpfs 5.0M 0 5.0M 0% /run/lock
14:52:20 tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup
14:52:20 /dev/loop0 295M 295M 0 100% /snap/google-cloud-sdk/239
14:52:20 /dev/loop1 45M 45M 0 100% /snap/snapd/15534
14:52:20 /dev/loop2 56M 56M 0 100% /snap/core18/2344
14:52:20 /dev/sda15 105M 4.4M 100M 5% /boot/efi
could you add these commands to the end of the execution?
docker ps -a
docker stats --no-stream --no-trunc
Dominique, I pulled all stack logs from the Integrations repository and run grep -rni watermark
against them. I didn't find any issue in any of that logs (different stack versions).
I'm afraid that it might a problem related specifically with Synthetics integration and you may want to start digging there.
elastic-package version: v0.48.0 Stack versions: 8.2.0-SNAPSHOT, 8.3.0-SNAPSHOT Pipeline affected: https://apm-ci.elastic.co/job/apm-agent-rum/job/e2e-synthetics-mbp/ Tests that run on this pipeline: https://github.com/elastic/synthetics/blob/main/__tests__/e2e/synthetics.journey.ts Documentation for these tests: https://github.com/elastic/synthetics/tree/main/__tests__/e2e Script where elastic-package is invoked: https://github.com/elastic/synthetics/blob/main/__tests__/e2e/scripts/setup_integration.sh
The cluster spun up for Elastic Synthetics e2e tests is reporting low disk watermark exceeded, even when no Synthetics data is indexed.
The cluster is brought up with
elastic-package stack up --version {version}
.This error was uncovered by extracting the ES logs, both with the Synthetics tests enabled and disabled. When the tests are disabled, ES still reports low disk watermark exceeded:
Full logs: https://apm-ci.elastic.co/job/apm-agent-rum/job/e2e-synthetics-mbp/view/change-requests/job/PR-499/25/consoleFull (Search for
Fetching ES logs
)When tests are enabled, shards are not allocated for the Synthetics data, resulting in
NoShardAvailableActionException
.Full logs
https://apm-ci.elastic.co/job/apm-agent-rum/job/e2e-synthetics-mbp/view/change-requests/job/PR-499/24/consoleFull
(Search no_shared_available_action_exception or NoShardAvailableActionException)This has caused Elastic Synthetics e2e tests to fail consistently for the last few days.