delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.22k stars 1.62k forks source link

[Tests] Some tests in `DeletionVectorsSuite` are flakey (timing out) #3257

Closed vkorukanti closed 1 week ago

vkorukanti commented 2 weeks ago

https://github.com/delta-io/delta/pull/3246#issuecomment-2161286020 (seen in one other PR too)

Looking at the tests, we have a test table with 50mil rows that is used multiple times. The high number was used to create a Parquet file with multiple row groups. We can still get the Parquet files with multiple row groups and lower table row count.

spark = SparkSession.builder \
    .appName("ParquetRowGroupSize") \
    .config("spark.sql.parquet.block.size", 512 * 1024 * 1024)  # Set to 512 MB
    .getOrCreate()