dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
834 stars 161 forks source link

Dataform --dry-run fails with large number of actions #1541

Open elanor-sparx opened 1 year ago

elanor-sparx commented 1 year ago

I have a medium to large setup: 875 actions

What I'd like to do is add --dry-run to our CICD to check executions, but it fails when I try with dataform run --dry-run --timeout 2m:

Dataform encountered an error: request to https://oauth2.googleapis.com/token failed, reason: socket hang up
FetchError: request to https://oauth2.googleapis.com/token failed, reason: socket hang up
    at ClientRequest.<anonymous> (/usr/local/lib/node_modules/@dataform/cli/node_modules/node-fetch/lib/index.js:1505:11)
    at ClientRequest.emit (events.js:315:20)
    at ClientRequest.EventEmitter.emit (domain.js:467:12)
    at TLSSocket.socketOnEnd (_http_client.js:493:9)
    at TLSSocket.emit (events.js:327:22)
    at TLSSocket.EventEmitter.emit (domain.js:467:12)
    at endReadableNT (internal/streams/readable.js:1327:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)

When I try a smaller individual action with lots of dependents it works, but a bigger one fails similarly to above - hence my thinking that it has to do with the number of actions and/or a token timeout.

dry-run-med-action (1)

large-action (1)

Any thoughts on what's going on here? The above were run using my own GCP credentials, we are going to try a service account next.

elanor-sparx commented 1 year ago

Additional info: my Dataform version is 2.6.7

Ekrekr commented 5 months ago

Thanks for the report!

It being a token timeout seems like a reasonable idea - however it's hard for me to reproduce it without being given a project that can replicate it.

It doesn't seem to be caused by our timeout options: