confluentinc / kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Other
18 stars 955 forks source link

Single-quotes are being stripped from query="" clauses #608

Closed markteehan closed 5 years ago

markteehan commented 5 years ago

For the Kafka Connect JDBC Source connector, when using the query= property, single quotes are stripped from the statement; so string predicates cannot be specified.

For example: "query":"SELECT * FROM (SELECT * FROM gdelt-bq.gdeltv2.events WHERE sqldate>=20190303 and actor1countrycode='USA')" This fails to parse as the connector changes it to SELECT * FROM (SELECT * FROM gdelt-bq.gdeltv2.events WHERE sqldate>=20190303 and actor1countrycode=USA)

It runs ok if it has the first predicate; since a numeric predicate doesn't require quotes. This is unrelated to the quoted.identifiers property - which controls quoting of identifier names; not of expression strings.

This is using the GenericDatabase dialect as the source is BigQuery. It is only possible to use query=; I cannot use table.whitelist= because BigQuery does not expose a queryable data dictionary. The only way to expose a rowset for streaming is using query=.

markteehan commented 5 years ago

This is a bash issue as the curl cmd is in bash script so embedding single quotes inside double quotes inside single quotes doesnt work. Instead, create a cmd.json with the json for the job submit command, and call it using curl.

For example:

$ cat source_1.json
{
             "name": "Q_RUS",
             "config": {
                     "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",                     "connection.url":"jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=XXX;OAuthType=0;OAuthServiceAcctEmail=markteehan-XXX.gserviceaccount.com;OAuthPvtKeyPath=/root/XXX.json;Timeout=60;QueryDialect=SQL",
                     "mode":"incrementing",
                     "query":"SELECT * FROM (SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE (sqldate>=20190303) and (actor1countrycode='''RUS''') )",
                     "incrementing.column.name":"GLOBALEVENTID",
                     "batch.max.rows":"100",
                     "flush.size":"100",
                     "tasks":"1",
                     "topic.prefix": "gdelt_event_0858",
                     "numeric.mapping":"best_fit",
                     "poll.interval.ms":"6000"
                     }
             }

And then submit it: curl -X POST -H "Content-Type: application/json" --data @./source_1.json ${CONNECT}/connectors?HTTP=1.1

Tin-Nguyen commented 3 years ago

The workaround solution works for me. Thanks @markteehan

Munaf305 commented 1 year ago

Hi @markteehan @Tin-Nguyen : were you able to successfully connect to bigquery using JDBC. I tried all possible ways but ended up with below error. java.sql.SQLException: Simba BigQueryJDBCDriver]-100032 Error executing query job. Message: EXEC_JOB_EXECUTION_ERR. Attempting retry 4 of -1 attempts. (io.confluent.connect.jdbc.source.JdbcSourceTask:477)

Thanks