databricks / databricks-vscode

VS Code extension for Databricks
Other
125 stars 22 forks source link

Running pytest with local spark session #1152

Open rotemb-cye opened 8 months ago

rotemb-cye commented 8 months ago

Hey,

I am trying to run pytest on my local PC, when databricks extension is installed. I am trying to create local spark session:


def get_spark_session():
    spark = (
        SparkSession.builder.master("local[*]")
        .appName("local-tests")
        .config("spark.driver.bindAddress", "127.0.0.1")
        .getOrCreate()
    )
    return spark

@pytest.mark.etl
@pytest.fixture(scope="session")
def spark_session():
    spark = get_spark_session()
    yield spark
    spark.stop()

and I get the following error: RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.

How to solve it? I want to be able to run my pytest when being offline

TNXXX

htuomola commented 7 months ago

Hi, this is the same exact issue that we have been struggling with. It seems that installing databricks-connect modifies installed pyspark package and adds throwing this error to the code. I'm also interested in finding a workaround for this because in the current state it basically blocks using Databricks Connect.

benoitLebreton-perso commented 6 months ago

Hello I managed to get my local spark session working by the following VSCode command palette

image

In fact, even uninstalling the extension was not working.

odimko commented 3 months ago

thanks for your solution @benoitLebreton-perso! Do you know how to fix the issue when you run pytest from a command line?

bestekov commented 1 month ago

@benoitLebreton-perso , did you manage to have two versions of pyspark installed? Or did you go the route of uninstalling databricks-connect?

The big issue seems to be that installing databricks connect uninstalls the rest of the full pyspark which is extremely annoying. Would be much better to sideload and patch commands only when commands are invoked in a databricks-connect context.

benoitLebreton-perso commented 1 month ago

@benoitLebreton-perso , did you manage to have two versions of pyspark installed? Or did you go the route of uninstalling databricks-connect?

The big issue seems to be that installing databricks connect uninstalls the rest of the full pyspark which is extremely annoying. Would be much better to sideload and patch commands only when commands are invoked in a databricks-connect context.

I uninstalled databricks-connect. I work in local environnement with a local pyspark session. I work with databricks spark session only on notebooks now and I sync my local code with my repos