Open felipepessoto opened 1 year ago
Since I opened the issue, tests went from ~230m, to more than 360m (they are timing out)
@edmondop, @tdas, @scottsand-db, @allisonport-db (from #1249)
We could add tags (Group1, Group2....Group 10) to unit tests and change run-tests.py adding a testOnly argument, and a runPythonTests.
It would work independently of the infra being used. You could start several VMs/Agents each one calling run-tests.py with different tags. What you think? I can send a PR, but would like to confirm if somebody will be able to review it.
Hi @felipepessoto
We could add tags (Group1, Group2....Group 10) to unit tests
How would this work? Is this a manual process? Would we have to enforce this on all existing code and all new PRs?
It is manual. My suggestion is to add a dedicated group for big tests, like Merge and CDC, and split the remaining in Group1, Group2.... And it is up to the pipeline how to run it. Usually, we would start a new VM for each group + one for tests without groups + one for Java (for some reason the filter for tests without groups doesn't work):
* -- -n org.apache.spark.sql.delta.testtags.DeltaTestsMergeTag
* -- -n org.apache.spark.sql.delta.testtags.DeltaTestsCDCTag
* -- -l org.apache.spark.sql.delta.testtags.DeltaTestsMergeTag -l org.apache.spark.sql.delta.testtags.DeltaTestsCDCTag
io.delta.sql.JavaDeltaSparkSessionExtensionSuite io.delta.tables.JavaDeltaTableBuilderSuite...
New tests would land on "other" categories if not tagged
Feature request
Overview
The unit tests are taking longer every new version. As a reference, the build in this PR, from a year ago took 61-77 minutes: https://github.com/delta-io/delta/pull/887
Motivation
I think we need to improve it before it becomes out of control.
Further details
Parallel tests are disabled:
Do we have any alternatives?
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?