Closed cristian-marisescu closed 1 month ago
Thanks for the feedback!
SQLFrame does have transform support although it doesn't work on many engines since they don't support that function: https://github.com/eakmanrq/sqlframe/blob/main/sqlframe/base/functions.py#L1590-L1598
What engine are you running against?
Looks like withColumns
is currently missing. Will add that tonight!
What's the plan in keeping up with updates/functions from either spark or other related engines?
Similar approach to what other projects like SQLGlot do: Add initial support for the most common operations and then add additional functions as requested. The PySpark API is very big so 100% isn't realistic at first but I will add features as requested and quickly cover the most common operations. One it is close to 100% I can run tests to identify gaps as automatically as new versions are released but it is a bit early in the product's development to achieve that today. This same thinking applies to other engines.
Thank you for the fast and clear response.
I was running it with duckdb, something along the lines.
from sqlframe.duckdb import DuckDBDataFrame
from sqlframe.duckdb import DuckDBSession
from sqlframe.duckdb import functions as F
def generic_transformer(df):
#some actions
return transformed_df
my_initial_df.transform(generic_transformer)
getting
TypeError: 'Column' object is not callable
same TypeError, on calling .withColumns
Oh it looks like you are using the DataFrame transform method: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.transform.html
(not the transform function itself)
I will look at adding that tonight too!
In terms of the error, SQLFrame assumes if you do df.<whatever>
and <whatever>
is not found, then you must be referencing a column. So since df.transform
and df.withColumns
are not currently supported, it gives you that strange error. Will think about how to improve that since it could be a common issue.
You're right, I just checked now and saw I pasted the wrong thing.
Thank you for all the help and indeed, +1 to the Error Handling.
Your feedback has been addressed with 1.6.0
: https://github.com/eakmanrq/sqlframe/releases/tag/v1.6.0
Please open an issue for any other issues you may have!
Hi, first of all, nice project. I'm really rooting for it as I'm facing the same issues you mentioned.
I started testing it on my codebase, but I quickly ran into missing functions.
I use a lot of .transform: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.transform.html
and .withColumns: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.withColumns.html
And this brings me to my next question: What's the plan in keeping up with updates/functions from either spark or other related engines?
Thanks in advance and again, great work!