GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.14k stars 950 forks source link

[Bug]: Dataflow - MongoDB-to-BigQuery batch mode failing with NullPointer for userOption FLATTEN #516

Open pushkarva opened 1 year ago

pushkarva commented 1 year ago

Related Template(s)

MongoDB-to-BigQuery

What happened?

[Bug]: Dataflow - MongoDB-to-BigQuery batch mode failing with NullPointer for userOption FLATTEN

command:

gcloud dataflow flex-template run mongodbtobq01 --template-file-gcs-location gs://dataflow-templates-us-central1/latest/flex/MongoDB_to_BigQuery --region us-central1 --parameters mongoDbUri=mongodb+srv://xxxxxxxxxxxxx,database=xxxxxx-dev,collection=usermasters,outputTableSpec=skilled-duality-xxxxxx.xxxxxx.xxxxxxxxx,userOption=FLATTEN

Getting NullPointer: (It does load data as json into big query when i use userOption as NONE)

{"severity":"INFO","time":"2022/11/20 20:44:09.252706","line":"exec.go:66","message":"java.lang.NullPointerException"} {"severity":"INFO","time":"2022/11/20 20:44:09.252951","line":"exec.go:66","message":"\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbUtils.lambda$getTableFieldSchema$0(MongoDbUtils.java:58)"}

Beam Version

Newer than 2.35.0

Relevant log output

{"severity":"INFO","time":"2022/11/20 20:44:09.183742","line":"exec.go:66","message":"[main] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:95763}] to xxxxxxxxxxxxx.mongodb.net:27017"}
{"severity":"INFO","time":"2022/11/20 20:44:09.251695","line":"exec.go:66","message":"Exception in thread \"main\" "}
{"severity":"INFO","time":"2022/11/20 20:44:09.252706","line":"exec.go:66","message":"java.lang.NullPointerException"}
{"severity":"INFO","time":"2022/11/20 20:44:09.252951","line":"exec.go:66","message":"\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbUtils.lambda$getTableFieldSchema$0(MongoDbUtils.java:58)"}
{"severity":"INFO","time":"2022/11/20 20:44:09.253110","line":"exec.go:66","message":"\tat java.base/java.util.Map.forEach(Map.java:661)"}
{"severity":"INFO","time":"2022/11/20 20:44:09.253229","line":"exec.go:66","message":"\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbUtils.getTableFieldSchema(MongoDbUtils.java:53)"}
{"severity":"INFO","time":"2022/11/20 20:44:09.253576","line":"exec.go:66","message":"\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.run(MongoDbToBigQuery.java:61)"}
{"severity":"INFO","time":"2022/11/20 20:44:09.253710","line":"exec.go:66","message":"\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.main(MongoDbToBigQuery.java:55)"}
{"severity":"INFO","time":"2022/11/20 20:44:09.286952","line":"exec.go:52","message":"java failed with exit status 1"}
{"severity":"INFO","time":"2022/11/20 20:44:09.287029","line":"launch.go:80","message":"Template launch failed: exit status 1"}
{"severity":"INFO","time":"2022/11/20 20:44:09.287049","line":"launch.go:102","message":"Uploading console logs to gcs location: gs://dataflow-staging-us-central1-517875895576/staging/template_launches/2022-11-20_12_42_55-15353145343331911066/console_logs"}
theshanbhag commented 1 year ago

Could you please let me know if the schema for the collection is constant for all the documents?

Upperfoot commented 1 year ago

Could you please let me know if the schema for the collection is constant for all the documents?

Why would this be an issue? MongoDB is specifically built for variable data schemas

jenbeattie commented 1 year ago

I'm also running into this issue - although I couldn't figure out the cause as java.lang.NullPointerException is decidedly unhelpful. I tried setting a specific schema in BigQuery for the destination table but to no avail

bvolpato commented 1 year ago

We have made improvements to the template - now we'll avoid throwing NullPointerException and infer null values as STRINGs. There are still improvements to do -- such as allowing/reusing a given BigQuery schema.

ivan-sukhomlyn commented 4 months ago

I've got the same error while trying to import from the Atlas cluster with the MongoDB 7.2 version.

com.google.cloud.teleport.v2.common.UncaughtExceptionLogger - The template launch failed.
java.lang.NullPointerException: null
    at com.google.cloud.teleport.v2.mongodb.templates.MongoDbUtils.getTableFieldSchema(MongoDbUtils.java:77)
    at com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.run(MongoDbToBigQuery.java:121)
    at com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.main(MongoDbToBigQuery.java:96)