finos / morphir-elm

Tools to work with the Morphir IR in Elm.
https://package.elm-lang.org/packages/finos/morphir-elm/latest
Apache License 2.0
46 stars 65 forks source link

Check whether the Spark transpiler preserves ordering in Records. #879

Closed jonathanmaw closed 2 years ago

jonathanmaw commented 2 years ago

When working on aggregation filters, I had an example,

testAggregateFilterOneCount : List Antique -> List { product : Product, vintage : Float, all : Float }
testAggregateFilterOneCount antiques =
    antiques
        |> groupBy .product
        |> aggregate
            (\key inputs ->
                { product = key
                , vintage = inputs (count |> withFilter (\a -> a.ageOfItem >= 20.0))
                , all = inputs count
                }
            )

which was transpiled into

  def testAggregateFilterOneCount(
    antiques: org.apache.spark.sql.DataFrame
  ): org.apache.spark.sql.DataFrame =
    antiques.groupBy("Product").agg(
      org.apache.spark.sql.functions.count(org.apache.spark.sql.functions.lit(1)).alias("all"),
      org.apache.spark.sql.functions.count(org.apache.spark.sql.functions.when(
        (org.apache.spark.sql.functions.col("ageOfItem")) >= (20),
        org.apache.spark.sql.functions.lit(1)
      )).alias("vintage")
    )

i.e. the order of columns "all" and "vintage" got swapped. This appears to be significant, as the spark tests in scala consider dataframes of product,all,vintage different to dataframes product,vintage,all.

The solution to get the tests working was to swap the order of the fields in the record, which suggests they may be sorted alphabetically somewhere.

jonathanmaw commented 2 years ago

Following discussion, maintaining the ordering of columns is not important. If this becomes a problem for tests then the tests should stop being sensitive to changes in ordering.