Open jstammers opened 3 months ago
Pip installing pyspark is waaaaay simpler so I think we should probably start with that and see if we run into problems, but just keep the docker idea in the back of our minds
When you pip install a pinned version of pyspark, do you know if you are only getting a pinned python layer version, or you also are getting a pinned version of the underlying java engine?
pyspark versions mirror the underlying spark version, so pyspark 3.5.2 is a python layer on top of spark 3.5.2 etc.
I'm not sure how backwards-compatible we would want to be with this, as in my experience there can be a few changes to the API between minor versions. Wherever possible, we would expect those to be handled within the ibis backend. I know that supports back to 3.3, depending on the function (e.g. pyarrow UDFs were implemented in pyspark 3.5), but I would expect most if not all of the built-in ibis functions to be supported in pyspark 3.3.
@NickCrews thanks for the review on this. I've implemented most of the changes you've suggested. I'd like to make use of the notyet, never, etc. marks from ibis
, but struggling to think of a way to do this that doesn't directly copy the conftest.py
module from there.
Will let you know when this is ready for a second review
sorry @jstammers, I just switched the project manager to be uv instead of PDM, you're gonna get a few merge conflicts when you rebase :(
Oof, just took a look at the conftest.py's of the various backends from ibis. Open to what you come up with, but at first glance I don't think we need to be as thorough as they are. I think we want something equivalent (but probably lighter weight) to their BackendTest class, which would
Then, we create the BackendTest instance per pytest session using pytest fixtures. Let's look at how ibis does this, I bet they have this fairly worked out.
This adds an option
--backend
for pytest that can be used to configure theBackend
for ibis. Currently, this only supports the use ofpyspark
And many of the tests appear to be failing.To make the CI tests clearer, we may want to define a separate workflow to run the tests with a pyspark backend, similar to how this is done for ibis