GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 323 forks source link

BigQueryIO.write fails when destination has partition decorator #620

Open darshanmehta10 opened 6 years ago

darshanmehta10 commented 6 years ago

Following is the code that writes to BigQuery:

BigQueryIO.writeTableRows()
 .to(destination)
 .withCreateDisposition(CREATE_IF_NEEDED)
 .withWriteDisposition(WRITE_APPEND)
 .withSchema(tableSchema)
 .expand(tableRows);

Here's the destination's implementation:

public TableDestination apply(ValueInSingleWindow<TableRow> input) {
 String partition = timestampExtractor.apply(input.getValue())
 .toString(DateTimeFormat.forPattern("yyyyMMdd").withZoneUTC());
 TableReference tableReference = new TableReference();
 tableReference.setDatasetId(dataset);
 tableReference.setProjectId(projectId);
 tableReference.setTableId(String.format("%s_%s", table, partition));
 log.debug("Will write to BigQuery table: %s", tableReference);
 return new TableDestination(tableReference, null);
}

When the dataflow tries to write to this table, I see the following message:

"errors" : [ {
 "domain" : "global",
 "message" : "Cannot read partition information from a table that is not partitioned: <project_id>:<dataset>.<table>$19730522",
 "reason" : "invalid"
 } ]

So, it looks like it's not creating tables with partition in the first place?

Apache beam version : 2.2.0

santhh commented 5 years ago

Hello I am also seeing the same issue with 2.7. Is there any workaround?