GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 325 forks source link

Empty files in GCS/BigQuery when running dataprep template #624

Open noamackerman opened 6 years ago

noamackerman commented 6 years ago

Hi, I am using dataprep templates to invoke dataflow jobs using a cloud function with a GCS trigger (when new file arrive). There is constantly a strange behavior where the results of a run of multiple jobs concurrently create 0b files in GCS or empty tables in BigQuery (tried both). To reproduce it, use a dataprep template and invoke multiple concurrent jobs with it. (e.g. move multiple files to a GCS bucket that will trigger a cloud function who will use the SDK to create and run the jobs)

Highly apricate your help, Noam