DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading from Apache Cassandra(R), DataStax Astra and DataStax Enterprise (DSE)
Apache License 2.0
85
stars
30
forks
source link
dsbulk unload stuck when config -maxConcurrentFiles (write concurrency) greater than 1 #463
I'm unloading 10000000 rows from C* table with by using LIMIT query
dsbulk unload -query "SELECT col1, col2 FROM keyspace.table LIMIT 10000000" -maxRecords 1000000 -header false -verbosity high --connector.csv.compression gzip -url table.csv.gz
The command generates 1 read concurrency & 4 write concurrency, checking the logs I didn't find Operation UNLOAD_20230216-042948-286777 closed. line as usual, and still see dsbulk process when checking with ps aux
dsbulk version: 1.10.0
I'm unloading 10000000 rows from C* table with by using LIMIT query
dsbulk unload -query "SELECT col1, col2 FROM keyspace.table LIMIT 10000000" -maxRecords 1000000 -header false -verbosity high --connector.csv.compression gzip -url table.csv.gz
The command generates 1 read concurrency & 4 write concurrency, checking the logs I didn't find
Operation UNLOAD_20230216-042948-286777 closed.
line as usual, and still see dsbulk process when checking withps aux
This bug was not found in dsbulk version: 1.9.1 or set
-maxConcurrentFiles 1
┆Issue is synchronized with this Jira Task by Unito