Closed rileyschack closed 3 years ago
Obviously a workaround is to create the column first. Doesn't work as nicely though if the user wants to use a list comprehension to create an array of n columns (with some sort of transformation first).
I didn't anticipate users referencing a key from a Map column through a Column object. Valid use case.
PySpark comes with all sorts of built in ways to parse user parameters and we're probably better off falling back to their solutions. The downside is of course that many of the functions they utilize are not user-facing, i.e. _function_name
.
Existing solution is written for pre-3.1.0 and requires falling back to Spark SQL built-ins as the PySpark DataFrame API did not contain a reference to the filter
function. PySpark >=3.1.0 now contains a DataFrame API reference to the filter function for arrays.
I'm not sure there's any interest in maintaining support for older versions of PySpark at this time given our user base of 6.
cols_to_array
only accepts columns passed as astr
and not as aColumn
. Use cases are when a user needs to pass a map with a specified key/value