GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.14k stars 950 forks source link

[Bug]: the csv resolving bug on CSVToBigQuery template #1764

Open OpensourceHU opened 1 month ago

OpensourceHU commented 1 month ago

Related Template(s)

CSVToBigQuery

Template Version

2024-07-16-00_rc00

What happened?

the csv file resolving will encounter error: "Number of fields in the schema and number of Csv headers do not match." when csv file fieds has comma in text, for example ,if a we have two field ,field2 is a json string {field1},{field2} field1Text,"{""key1"":"value1"",""key2"":"value2""}" the spiliter will split it to 3 column , which number is not match with the csv header and bq schema, and this row will transform failed. the problem probably in line 199 of CSVToBigQuery.java Splitter.on(delimiter.get()).splitToList(context.element()).toArray(new String[0]); please consider use csv utils package to fix this bad case

Relevant log output

Number of fields in the schema and number of Csv headers do not match.