dbeaver / dbeaver

Free universal database tool and SQL client
https://dbeaver.io
Apache License 2.0
40.66k stars 3.51k forks source link

Snowflake "java.lang.StringIndexOutOfBoundsException (plus other issues) when parsing result of REGEXP_SUBSTR_ALL which contains comma #36361

Open romanet opened 2 days ago

romanet commented 2 days ago

Description

SQL

select REGEXP_SUBSTR_ALL('FROM select from AAAA, LATERAL S', 'from\\s+[^\(\\s]+', 1, 1, 'im') TABLES_ARR;

Error

(occurred when comma is the last character in the result string.

!ENTRY org.jkiss.dbeaver.model 2 0 2024-11-22 16:22:33.474
!MESSAGE Can't read column 'TABLES_ARR' value
!SUBENTRY 1 org.jkiss.dbeaver.model 2 0 2024-11-22 16:22:33.474
!MESSAGE begin 1, end 0, length 1
!STACK 0
java.lang.StringIndexOutOfBoundsException: begin 1, end 0, length 1
    at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
    at java.base/java.lang.String.substring(Unknown Source)
    at org.jkiss.dbeaver.model.impl.jdbc.data.JDBCCollection.makeCollectionFromString(JDBCCollection.java:480)
    at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCArrayValueHandler.getValueFromObject(JDBCArrayValueHandler.java:79)
    at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCComplexValueHandler.fetchColumnValue(JDBCComplexValueHandler.java:50)
    at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCArrayValueHandler.fetchColumnValue(JDBCArrayValueHandler.java:60)
    at org.jkiss.dbeaver.model.impl.jdbc.data.handlers.JDBCAbstractValueHandler.fetchValueObject(JDBCAbstractValueHandler.java:49)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetDataReceiver.fetchRow(ResultSetDataReceiver.java:126)
    at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.fetchQueryData(SQLQueryJob.java:911)
    at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:682)
    at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$2(SQLQueryJob.java:522)
    at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:192)
    at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:541)
    at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:1007)
    at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:4271)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:128)
    at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:192)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:126)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:5166)
    at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:119)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

same SQL in snowsql

user#WH@DB.SCHEMA>select REGEXP_SUBSTR_ALL('FROM select from AAAA, LATERAL S', 'from\\s+[^\(\\s]+', 1, 1, 'im') TABLES_ARR;
+------------------+
| TABLES_ARR       |
|------------------|
| [                |
|   "FROM select", |
|   "from AAAA,"   |
| ]                |
+------------------+
1 Row(s) produced. Time Elapsed: 22.689s

See "Additional context" for other parsing issues.

DBeaver Version

Community Edition 24.2.5.202411171748

Operating System

Windows 11 Enterprise Version 23H2

Database and driver

Database: Snowflake Server 8.44.2 JDBC Driver: 3.16.0 (snowflake-jdbc-3.16.0.jar)

Steps to reproduce

  1. Connect to snowflake
  2. execute select REGEXP_SUBSTR_ALL('FROM select from AAAA, LATERAL S', 'from\\s+[^\(\\s]+', 1, 1, 'im') TABLES_ARR;

instead of '[| "FROM select", "from AAAA," ]' shown 'java.lang.StringIndexOutOfBoundsException: begin 1, end 0, length 1'

Additional context

More examples:

Exception when a comma is the last character

Other issues:

  1. comma replaced with line break when a comma is in a middle.
  2. comma is not included in the result
  3. characters after comma aren't included in the result

TABLES_ARR_AS_JSON - expected resulst

 SELECT EXAMPLE_NUMBER, REGEXP_SUBSTR_ALL(VALUE, 'from\\s+[^\(\\s]+') TABLES_ARR_DEFAULT_ARGS, 
    EXAMPLE_NUMBER, REGEXP_SUBSTR_ALL(VALUE, 'from\\s+[^\(\\s]+', 1, 1, 'im') TABLES_ARR,
       TO_JSON(REGEXP_SUBSTR_ALL(VALUE, 'from\\s+[^\(\\s]+', 1, 1, 'im')) TABLES_ARR_AS_JSON
  FROM(
        SELECT $1 EXAMPLE_NUMBER,
               $2 VALUE
          FROM (
            VALUES (1, 'FROM select from AAAA, LATERAL S'), 
                   (2, 'FROM select from AAAA,'),
                   (4, 'FROM select, from AAAA'),
                   (5, 'FROM , from AAAA'),
                   (6, 'FROM ,'),
                   (7, 'FROM select from AAAA'),
                   (8, 'FROM select from ,AAAA'),
                   (9, 'FROM selec,t from ,AAAA'),
                   (10, 'FROM select from AAAA ,')
                   )
       );
ShadelessFox commented 1 day ago

Hello @romanet,

Thank you for the bug report.

I reproduced it using the 3.16.0 version of the driver. Interestingly enough, it's not reproducible against 3.20.0 because arrays are returned as strings.