GoogleCloudPlatform / zetasql-toolkit

The ZetaSQL Toolkit is a library that helps users use ZetaSQL Java API to perform SQL analysis for multiple query engines, including BigQuery and Cloud Spanner.
Apache License 2.0
38 stars 9 forks source link

Analyzer Raises Exception on Assignment of FLOAT64 to FLOAT Type Field #25

Closed dion-ricky closed 5 months ago

dion-ricky commented 11 months ago

Hi,

We have a merge query that updates/inserts rows into BQ table, but the analyzer raises exception on the assignment of FLOAT64 type value to FLOAT type field. Due to my limited knowledge of the inner working of ZetaSQL I have no idea how to fix it. I have looked around at the LanguageOptions but haven't tried anything yet. Here's a snippet of the query for example only:

MERGE INTO `bq-project.dataset.table_name` AS target
USING (
  SELECT
      1 AS id,
      0.99 AS amount,
      TIMESTAMP('2023-11-07 01:00:00') as date
) as source
ON target.id = source.id
WHEN MATCHED AND source.date >= target.date THEN
UPDATE SET id = source.id, amount = source.amount
WHEN NOT MATCHED BY TARGET THEN
INSERT ( id, amount ) VALUES ( source.id, source.amount )

Here's the exception raised by the Analyzer:

Exception in thread "main" com.google.zetasql.toolkit.AnalysisException: Value of type FLOAT64 cannot be assigned to amount which has type FLOAT [at xx:xx]
    at com.google.zetasql.toolkit.ZetaSQLToolkitAnalyzer$StatementAnalyzer.analyzeNextStatement(ZetaSQLToolkitAnalyzer.java:232)
    at com.google.zetasql.toolkit.ZetaSQLToolkitAnalyzer$StatementAnalyzer.next(ZetaSQLToolkitAnalyzer.java:211)
    at com.google.zetasql.toolkit.ZetaSQLToolkitAnalyzer$StatementAnalyzer.next(ZetaSQLToolkitAnalyzer.java:148)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at com.example.data.Main.main(Main.java:28)
ppaglilla commented 11 months ago

Thank you for reporting this!

This is happening because the toolkit is using the ZetaSQL FLOAT type (i.e. a 32 bit float) to represent BigQuery FLOAT64 columns, when in reality it should be using the DOUBLE type (i.e. an actual 64 bit float). I just fixed it in the version/v0.5.0 branch. I'm hoping that version releases to maven within a week.

In the meantime, I can offer this sort of hacky workaround. Before analyzing; you could find the table that has the FLOAT column, change that to a DOUBLE and replace the table in the catalog. Of course, this wouldn't be necessary once that new version releases.

// 1. Find the table that needs the column changed from FLOAT to DOUBLE
SimpleTable tableToUpdate = catalog.getZetaSQLCatalog()
    .getTable("bq-project.dataset.table_name", null);

// 2. Build the new column list, replacing FLOAT columns with DOUBLE columns
List<SimpleColumn> updatedColumns = tableToUpdate.getColumnList()
    .stream()
    .map(column -> {
      if (!column.getType().isFloat()) {
        return column;
      }

      return new SimpleColumn(
          tableToUpdate.getFullName(),
          column.getName(),
          TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE));
    })
    .collect(Collectors.toList());

// 3. Replace the table
SimpleTable updatedTable = new SimpleTable(tableToUpdate.getFullName(), updatedColumns);
catalog.register(updatedTable, CreateMode.CREATE_OR_REPLACE, CreateScope.CREATE_DEFAULT_SCOPE);
dion-ricky commented 11 months ago

Hi, thanks for you quick response and support. I've tried your code and it works. Looking forward to the release of 0.5.0.

dion-ricky commented 9 months ago

Hi @ppaglilla , I hope you're doing well. I've been monitoring for the release of v0.5.0, but I'm not seeing a lot of activities from you. Do you have updated estimate of the release of v0.5.0? Also I noticed that on branch version/v0.5.0 the output of column lineage didn't include dataset name and project name, just want to make sure that it's not broken on the next release 😄 .

dion-ricky commented 8 months ago

Hi @ppaglilla , not seeing a lot of update from you. I hope you're doing good. I just want to follow up about release v.0.5.0. It's been almost two months since last time I checked and I still don't see any new commits on branch version/v0.5.0. Can you give any update about this?

ppaglilla commented 5 months ago

I'm really sorry for this release being so delayed. Version 0.5.0 is available through maven now, see the release.