embulk / embulk-input-jdbc

MySQL, PostgreSQL, Redshift and generic JDBC input plugins for Embulk
Other
102 stars 74 forks source link

add SQL Server query timeout option #224

Closed case-k-git closed 3 years ago

case-k-git commented 3 years ago

add query timeout options . to avoid executing unintentional query

queryTimeout

The number of seconds to wait before a timeout has occurred on a query. The default value is -1, which means infinite timeout. Setting this value to 0 also implies to wait indefinitely.

https://docs.microsoft.com/en-us/sql/connect/jdbc/setting-the-connection-properties?redirectedfrom=MSDN&view=sql-server-ver15

./bin/embulk run -L ./embulk-input-jdbc/embulk-input-sqlserver/build/gemContents ./embulk-input-jdbc/tmp.yml

tmp.yml

in:
  type: sqlserver
  host: 0.0.0.0
  port: 1433
  user: SA
  password: <password>
  database: <database>
  query_timeout: 5
  query: |
    select * from <table>
out:
  type: stdout
2021-08-25 08:40:35.179 +0900 [INFO] (0015:task-0000): Connecting to jdbc:sqlserver://0.0.0.0:1433 options {user=SA, password=***, queryTimeout=5, applicationName=embulk-input-sqlserver, databaseName=<database>, loginTimeout=300, socketTimeout=1800000}
2021-08-25 08:40:45.215 +0900 [INFO] (0015:task-0000): SQL: select * from <table>

2021-08-25 08:40:45.220 +0900 [INFO] (0015:task-0000): > 0.00 seconds
1,jon,1
2,lia,0
3,roger,1
2021-08-25 08:40:45.231 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2021-08-25 08:40:45.234 +0900 [INFO] (main): Committed.
2021-08-25 08:40:45.235 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}

this option is not supported under the drivers

SQL Server jTDS properties http://jtds.sourceforge.net/faq.html

MySQL JDBC https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-connp-props-connection-authentication.html

PostgresSQL JDBC https://jdbc.postgresql.org/documentation/head/connect.html

hiroyuki-sato commented 3 years ago

Hello, @case-k-git

I'm not an SQLServer user. Have you ever tried the following configuration? It seems that this configuration works the same. Could you tell me the detail of why do you need to add this option?

in:
  type: sqlserver
  options:
    queryTimeout: 5
    # ... 
case-k-git commented 3 years ago

Hello @hiroyuki-sato

thank you for your check.

Have you ever tried the following configuration?

I see ... I did not notice that options . I gonna use it . but could you please wait until I finish to check? I have tried to use queryTimeout options. but did not timeout as I expected both options one and my fixed one.

Could you tell me the detail of why do you need to add this option?

I would like to avoid keep executing unintentional query not to press sqlserver database disk. in our company we are using snapshot separation to avoid with(nolock) query data lost and data duplicate issue. with(nolock) query can avoid blocking risk but cause data lost and duplicate issue. so we decided to use snapshot separation not to use with(nolock) query and not to cause query blocking risk. please check the detail under the tech blog about snapshot separation https://techblog.zozo.com/entry/sqlserver-transaction-isolation-level-snapshot

by using snapshot separation the commit record gonna write into the tempdb. SELECT query gonna read record from tempdb. if unintentional query is keep executed, tempdb disk gonna press and cause disk issue.

to avoid the risk of this we wanna set the query timeout option not to keep executing unintentional query

thank you

hiroyuki-sato commented 3 years ago

Hello, @case-k-git

This is just in my(as a this plugin user) opinion,

For example, I used some MySQL-specific JDBC options in the embulk-input-mysql plugin. I think you can use the same option in the embulk-input-sqlserver plugin. So, please try it. If the option doesn't work correctly, It needs to fix the problem at first.

in:
  type: mysql
  # mysql
  options: { useLegacyDatetimeCode: false, serverTimezone: "Pacific/Midway", noTimezoneConversionForDateType: false }

You can check the JDBC option in the logs like the below.

in:
  type: mysql
  options:
    hoge: HOGE
    fuga: FUGA
2021-08-25 17:41:15.154 +0900 [INFO] (0001:transaction): Connecting to jdbc:mysql://localhost:3306/embulk_test options {useCompression=true, socketTimeout=1800000, useSSL=false, user=root, useLegacyDatetimeCode=false, tcpKeepAlive=true, hoge=HOGE, fuga=FUGA, useCursorFetch=true, connectTimeout=300000, password=***, zeroDateTimeBehavior=convertToNull}

And If this plugin supports the new query_timeout option, I hope that embulk-output-sqlsever also supports the same option.(If this option is useful for an output plugin.)

hiroyuki-sato commented 3 years ago

Hello, @case-k-git

I can't test Microsoft SQLServer, but I confirmed the embulk-input-sqlserver plugin(0.12.3) supports the options parameter.

in:
  type: sqlserver
  # sqlserver options
  options:
    hoge: HOGE
    fuga: FUGA
    queryTimeout: -1

It outputs the following log. It contains queryTimeout. So, I think you can set a time query timeout.

2021-08-27 13:19:37.833 +0900 [INFO] (0001:transaction): \
Connecting to jdbc:sqlserver://localhost:1433 options {
  socketTimeout=1800000,
  user=root,
  databaseName=embulk_test,
  hoge=HOGE,
  fuga=FUGA,
  applicationName=embulk-input-sqlserver,
  password=***,
  loginTimeout=300,
  queryTimeout=-1} # <-- HERE
case-k-git commented 3 years ago

@hiroyuki-sato

Hi thank you for your check

yes I can set the options but did not timeout even if using large amout of data and set short time of query timeout. I have tested both options one and this pr fixed one and both of them not working(not time out). as long as I check the document it should be work.

anyway I gone close this pr .

thank you for helping me