datastax / dsbulk

DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading from Apache Cassandra(R), DataStax Astra and DataStax Enterprise (DSE)
Apache License 2.0
85 stars 30 forks source link

DSBulk unload fails to parse map[value] as provided in query #488

Open msmygit opened 10 months ago

msmygit commented 10 months ago

Table

CREATE TABLE baselines.map_text_dsbulk (
    k int PRIMARY KEY,
    v frozen<map<text, text>>
);

Add data to the table and validate

token@cqlsh:baselines> create table if not exists map_text_dsbulk(k int primary key, v frozen<map<text,text>>);
token@cqlsh:baselines> insert into map_text_dsbulk (k,v) VALUES ( 1,{'1':'sam'});
token@cqlsh:baselines> insert into map_text_dsbulk (k,v) VALUES ( 2,{'2':'maddy'});
token@cqlsh:baselines> select k, v['2'] as there from map_text_dsbulk;

 k | there
---+-------
 1 |  null
 2 | maddy

(2 rows)

DSBulk version used

% ./dsbulk --version
DataStax Bulk Loader v1.11.0

Output when running via CQLSH:

token@cqlsh:baselines> select k,v['1'] from map_text_dsbulk ;

 k | v['1']
---+--------
 1 |    sam
 2 |   null

(2 rows)

Unload operation executed and error observed

% ./dsbulk unload -b /path/to/secure-connect-<db_name>.zip -u token -p "AstraCS:REDACTED" -query "SELECT k,v['1'] FROM baselines.map_text_dsbulk" -url /path/to/Downloads/map_text_dsbulk
Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
Operation directory: /path/to/Tools/DSBulk/dsbulk-1.11.0/bin/logs/UNLOAD_20240111-134634-869533
Operation UNLOAD_20240111-134634-869533 failed: Invalid query: 'SELECT k,v['1'] FROM baselines.map_text_dsbulk' could not be parsed at line 1:10: mismatched input '[' expecting {',', '.', K_FROM, K_AS}.
   Caused by: InputMismatchException (no message).

Detailed exception output with --log.verbosity 2

...
2024-01-11 13:47:29 ERROR Operation UNLOAD_20240111-134725-299804 failed: Invalid query: 'SELECT k,v['1'] FROM baselines.map_text_dsbulk' could not be parsed at line 1:10: mismatched input '[' expecting {',', '.', K_FROM, K_AS}.
   Caused by: InputMismatchException (no message).
java.lang.IllegalArgumentException: Invalid query: 'SELECT k,v['1'] FROM baselines.map_text_dsbulk' could not be parsed at line 1:10: mismatched input '[' expecting {',', '.', K_FROM, K_AS}
    at com.datastax.oss.dsbulk.workflow.commons.schema.QueryInspector$1.syntaxError(QueryInspector.java:135)
    at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
    at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
    at org.antlr.v4.runtime.DefaultErrorStrategy.reportInputMismatch(DefaultErrorStrategy.java:327)
    at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:139)
    at com.datastax.oss.dsbulk.generated.cql3.CqlParser.selectStatement(CqlParser.java:468)
    at com.datastax.oss.dsbulk.generated.cql3.CqlParser.cqlStatement(CqlParser.java:217)
    at com.datastax.oss.dsbulk.workflow.commons.schema.QueryInspector.<init>(QueryInspector.java:145)
    at com.datastax.oss.dsbulk.workflow.commons.settings.SchemaSettings.init(SchemaSettings.java:281)
    at com.datastax.oss.dsbulk.workflow.unload.UnloadWorkflow.init(UnloadWorkflow.java:135)
Caused by: org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:485)
    at org.antlr.v4.runtime.Parser.match(Parser.java:206)
    at com.datastax.oss.dsbulk.generated.cql3.CqlParser.selectStatement(CqlParser.java:357)
    ... 5 common frames omitted
2024-01-11 13:47:29 DEBUG Operation UNLOAD_20240111-134725-299804 closing.
2024-01-11 13:47:31 DEBUG Operation UNLOAD_20240111-134725-299804 closed.
msmygit commented 10 months ago

OK, here is a workaround solution to achieving this,

./dsbulk unload -b /path/to/secure-connect-awesome-astra.zip -u token -p "AstraCS:REDACTED" -query "SELECT k,v FROM baselines.map_text_dsbulk" -header false 2> /dev/null | grep '{\\"2\\":.*}' |  awk -F, '{ gsub(/[{}]/, "", $2); split($2, a, ":"); gsub(/"/, "", a[1]); gsub(/"/, "", a[2]); print $1 "," a[2] }' | sed 's/\\//g'
2,maddy
msmygit commented 10 months ago

Another flavor is here,

./dsbulk unload -b /path/to/secure-connect-awesome-astra.zip -u token -p "AstraCS:REDACTED" -query "SELECT k,v FROM baselines.map_text_dsbulk" -header false 2> /dev/null | grep "{.*2.*}" |  awk -F, '{ gsub(/[{}]/, "", $2); split($2, a, ":"); gsub(/"/, "", a[1]); gsub(/"/, "", a[2]); print $1 "," a[2] }' | sed 's/\\//g' 
2,maddy