Open JensonChoi opened 2 months ago
@andygrove can I get a review please? Thank you in advance.
@andygrove can I get a review please? Thank you in advance.
Thanks for the PR @JensonChoi. The other change that will need to be made is to update the golden files for the tests that check for expected plans. You can find more information in the contributor guide: https://datafusion.apache.org/comet/contributor-guide/development.html#plan-stability-testing
@andygrove Following our discussions on Discord, I ran the following commands to update the golden files.
make clean; make release PROFILES="-Pspark-3.4"
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.4 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.4 -nsu test
make clean; make release PROFILES="-Pspark-3.5"
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.5 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.5 -nsu test
make clean; make release PROFILES="-Pspark-4.0"
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-4.0 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-4.0 -nsu test
The odd thing is that no file change was detected by git even though the commands ran successfully. Is it possible that updating the node name is not something that would be picked up by stability testing? Thank you in advance.
@andygrove Following our discussions on Discord, I ran the following commands to update the golden files.
make clean; make release PROFILES="-Pspark-3.4" SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.4 -nsu test SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.4 -nsu test make clean; make release PROFILES="-Pspark-3.5" SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.5 -nsu test SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.5 -nsu test make clean; make release PROFILES="-Pspark-4.0" SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-4.0 -nsu test SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-4.0 -nsu test
The odd thing is that no file change was detected by git even though the commands ran successfully. Is it possible that updating the node name is not something that would be picked up by stability testing? Thank you in advance.
It is possible that the golden files got written to a different location is you have SPARK_HOME
set. Could you try unsettling that env var if you have it set? If you don't have that set, perhaps add logging to see where the files are being written?
SPARK_HOME
is currently set to /home/jenson/datafusion-comet
, which is my local clone of the repo. Does that sound about right? I will try the logging route that you suggested to see where the files are being written to. Thanks.
@andygrove I reran the following commands without the SPARK_GENERATE_GOLDEN_FILES=1
flag:
make clean; make release PROFILES="-Pspark-3.4"
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.4 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.4 -nsu test
make clean; make release PROFILES="-Pspark-3.5"
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.5 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.5 -nsu test
make clean; make release PROFILES="-Pspark-4.0"
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-4.0 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-4.0 -nsu test
And they all passed. In addition, I briefly looked at the stability test suites here, and it appears none of them contains the node CometSparkToColumnar
. What should be the next steps here?
And they all passed. In addition, I briefly looked at the stability test suites here, and it appears none of them contains the node
CometSparkToColumnar
. What should be the next steps here?
Thanks for investigating this @JensonChoi. The golden files do not contain any RowToColumnar
transitions, so this explains why they did not need updating.
Which issue does this PR close?
Closes #936
Rationale for this change
What changes are included in this PR?
How are these changes tested?
Added unit test for row input. Still finding a Spark query that would force the use of
CometSparkColumnarToColumnar
node name.