crate / cratedb-toolkit

CrateDB Toolkit, an SDK for CrateDB and CrateDB Cloud.
https://cratedb-toolkit.readthedocs.io/
GNU Affero General Public License v3.0
7 stars 4 forks source link

[I/O] Investigate single-record inserts on bulk interface #310

Closed amotl closed 3 weeks ago

amotl commented 3 weeks ago

Introduction

On another spot, CTK's test suite discovered a regression in CrateDB, that made those test cases trip.

FAILED tests/adapter/test_rockset.py::test_rockset_add_documents - crate.client.exceptions.ProgrammingError: IndexOutOfBoundsException[Index: 1, Size: 1]
FAILED tests/adapter/test_rockset.py::test_rockset_query - crate.client.exceptions.ProgrammingError: IndexOutOfBoundsException[Index: 1, Size: 1]
FAILED tests/io/test_import.py::test_import_csv_dask - crate.client.exceptions.ProgrammingError: IndexOutOfBoundsException[Index: 1, Size: 1]
FAILED tests/io/test_import.py::test_import_csv_dask_with_progressbar - crate.client.exceptions.ProgrammingError: IndexOutOfBoundsException[Index: 1, Size: 1]

Thoughts

Depending on number of ingress records, and chunk size, it can always happen that there are single records being submitted to the bulk interface. However, we should take the chance to validate that those are actually just stray records, and that it does not happen across the boar, because performance would be poor.

seut commented 3 weeks ago

Should be fixed by https://github.com/crate/crate/pull/16921.