GoogleCloudPlatform / bigquery-utils

Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
https://cloud.google.com/bigquery/
Apache License 2.0
1.07k stars 269 forks source link

Export data as protocol buffer does not work for more than one field #396

Closed sanimesa closed 1 month ago

sanimesa commented 5 months ago

I tried the example from the documentation which has a STRUCT<STRING, BIGNUMERIC> and it does not produce anything For a single string field, the UDF does return values.

This is the proto file I used:

syntax = "proto3";

package dummypackage;

message DummyMessage {
  string word = 1;
  int64 num_word_count = 2;
}

The UDF:

CREATE OR REPLACE FUNCTION
  dev.toMyProtoMessage(input STRUCT<field_1 STRING, field_2 INT64>)
  RETURNS BYTES
    LANGUAGE js OPTIONS ( library=["gs:/xxxxxx/bigquery-udf/pbwrapper.js"]
) AS r"""
let message = pbwrapper.setup("dummypackage.DummyMessage")
return pbwrapper.parse(message, input)
  """;

Thanks.

ron-gal commented 1 month ago

Hey @sanimesa ! Sorry for the delayed response. When you have a proto with fields word and num_word_count, the STRUCT should have fields with names word and numWordCount to match it. If the names don't match, the pbwrapper.parse call won't work. To get the appropriate field names, you can use this trick:

CREATE OR REPLACE FUNCTION
  test.toMyProtoMessage(input STRUCT<dummyField STRING>)
  RETURNS STRING
  LANGUAGE js OPTIONS ( library=["gs://###/pbwrapper.js"] ) AS r"""
let message = pbwrapper.setup("dummypackage.DummyMessage")
return JSON.stringify(message)
  """;
SELECT
  test.toMyProtoMessage(STRUCT(word)) AS protoResult,
FROM
  `bigquery-public-data.samples.shakespeare`
LIMIT
  1;
afleisc commented 1 month ago

Going to close this issue as @ron-gal provided a solution, please follow up if needed!