Open ChenyuLInx opened 1 year ago
More color on the test fail due to Dataproc not scalable enough. There are two ways to submit dataproc job: Cluster vs Serverless, for Cluster we run a always on cluster, for serverless GCP would spin up a short lived server to just run one job.
Currently when running our python model tests with GHA, we run multiple tests at the same time. With Dataproc(Cluster or serverless), tests would fail due to underlying infra is overloaded. See this as an example.
We should skip python model tests in normal workflows, but create a scheduled run to run python model tests everyday so that we can still catch regression
When turning the test on, we also need to include the later added PySpark Dataframe Test(https://github.com/dbt-labs/dbt-core/pull/5906)