GoEddie / spark-connect-dotnet

MIT License
10 stars 2 forks source link

Column index operator of DataFrame throws if column is not available #12

Closed mrericrichter closed 4 months ago

mrericrichter commented 4 months ago

The public Column this[string name] method of the DataFrame class should throw an exception if the given column name is not part of the DataFrame. The exception message should include the name of the missing column.

This allows errors in the user's Spark code to be found right where they occur. Otherwise, errors about missing columns could pop up at very late stages and it might be very difficult to find the error.

GoEddie commented 4 months ago

I checked in PySpark and this is different behaviour so I will add it but put it behind an option

GoEddie commented 4 months ago

Fixed in build 16 - but you will need to set this option for it to work:

spark.Conf.Set("spark.connect.dotnet.validatethiscallcolumnname", "true");

See: https://github.com/GoEddie/spark-connect-dotnet/blob/main/docs/options.md