feldera / feldera

Feldera Continuous Analytics Platform
https://feldera.com
Other
290 stars 30 forks source link

Running the benchmark task from Earthly results in out of disk space message #1950

Open aehmttw opened 1 week ago

aehmttw commented 1 week ago

To reproduce, run earthly --verbose -P +benchmark

The SQL with storage benchmarks in particular run out of storage. Does anyone happen to know where these files are being stored? They don't seem to appear in my $TMPDIR.

Using docker exec earthly-buildkitd df -h, query q9 goes from 85.1G to 251G usage

+benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on output endpoint 'q9': 0.0.0.0:9092/0: Disconnected (after 270400ms in state UP)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on output endpoint 'q9': 2/2 brokers are down
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on output endpoint 'q9': 0.0.0.0:9092/0: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 16ms in state APIVERSION_QUERY)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on output endpoint 'q9': localhost:9092/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 0ms in state APIVERSION_QUERY)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': 0.0.0.0:9092/0: Disconnected (after 271423ms in state UP)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': GroupCoordinator: 0.0.0.0:9092: Disconnected (after 271526ms in state UP)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': 3/3 brokers are down
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': 0.0.0.0:9092/0: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 31ms in state APIVERSION_QUERY)
             ongoing | ongoing TODO
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': GroupCoordinator: 0.0.0.0:9092: Disconnected (after 22ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': localhost:9092/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 3ms in state APIVERSION_QUERY)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': localhost:9092/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 0ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on output endpoint 'q9': localhost:9092/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 92ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on output endpoint 'q9': 0.0.0.0:9092/0: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 0ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
          +benchmark | 2024-06-26 01:57:37 ERROR [pipeline-0190522a-1bf5-74ab-b0c3-fe2778f34a07] error on input endpoint 'bid': 0.0.0.0:9092/0: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 0ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
          +benchmark | thread 'thread 'thread 'dbsp-background-2dbsp-background-8' panicked at dbsp-worker-12' panicked at ' panicked at /dbsp/crates/dbsp/src/trace/ord/file/wset_batch.rs/dbsp/crates/dbsp/src/trace/ord/file/wset_batch.rs/dbsp/crates/dbsp/src/trace/ord/file/indexed_wset_batch.rs:::536536:909:22:22:
          +benchmark | 14:
          +benchmark | called `Result::unwrap()` on an `Err` value: StdIo(Os { code: 28, kind: StorageFull, message: "No space left on device" }):
          +benchmark | called `Result::unwrap()` on an `Err` value: StdIo(Os { code: 28, kind: StorageFull, message: "No space left on device" })
          +benchmark | called `Result::unwrap()` on an `Err` value: StdIo(Os { code: 28, kind: StorageFull, message: "No space left on device" })
blp commented 6 days ago

Related: https://github.com/feldera/feldera/issues/1955

aehmttw commented 4 days ago

Using min-storage-rows option in bench.bash may fix this; not sure if it is a problem on CI as much as on my machine.

blp commented 1 day ago

Using min-storage-rows option in bench.bash may fix this; not sure if it is a problem on CI as much as on my machine.

Did it make a difference?

aehmttw commented 1 day ago

On my machine min-storage-rows works if it's set to 100,000, but it fails for 25,000.