dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.02k stars 313 forks source link

[BUG]: udf problem after switching to v0.7.0 #369

Closed bolcman closed 4 years ago

bolcman commented 4 years ago

so this started happening after i switched to 0.7.0: here is my code.

        var spark = SparkSession.Builder()
                                .AppName("Hello Spark!")
                                 .Config("spark.memsql.host", "somehost")
                                 .Config("spark.memsql.user", "user")
                                 .Config("spark.memsql.password", "password")
                                .GetOrCreate();

        var df = spark.Read().Format("com.memsql.spark.connector")
                      .Option("query", "select code, date, holidayType, 'test' as name from spark_calendar limit 1000").Load();

        df.Show(10);

        var udf = Udf<string, string>((src) => src + "_test");
        df = df.WithColumn("output", udf(df["code"]));

        df.PrintSchema();
        df.Show();

now i switched back to 0.6.0 and it works perfectly, when i tried 0.7.0 again i got the error below.

v0 6 0

v0 7 0

imback82 commented 4 years ago

@bolcman I ran the following program:

static void Main(string[] args)
{
    var spark = SparkSession.Builder().GetOrCreate();
    var df = spark.Range(0, 5);
    var udf = Functions.Udf<int, string>(id => id.ToString());
    df.Select(udf(df["id"])).Show();
    df.Show();
}

I tried this with the following combinations: nuget: 0.6.0, worker: 0.6.0 nuget: 0.6.0, worker: 0.7.0 nuget: 0.7.0, worker: 0.6.0 nuget: 0.7.0, worker: 0.7.0, but I couldn't repro the behavior you are facing.

Could you share an isolated repro code, which doesn't depend on your mysql source (json, etc. will be fine)? That will help me debug this issue.

bolcman commented 4 years ago

ok, this is strange, I tried your example, got the same error:

nuget: 0.6.0, worker: 0.6.0 - works nuget: 0.7.0, worker: 0.6.0 - works nuget: 0.7.0, worker: 0.7.0 - error nuget: 0.6.0, worker: 0.7.0 - error

imback82 commented 4 years ago

@suhsteve @elvaliuliuliu can you try as well? I used Spark 2.4.4 with .NET Core 2.1 worker.

suhsteve commented 4 years ago

@bolcman can you share the version of spark you are using ?

elvaliuliuliu commented 4 years ago

I have tried the following combination, all work fine for me. I used Spark 2.4.1 with .NET Core 2.1 worker. nuget: 0.7.0, worker: 0.7.0 nuget: 0.6.0, worker: 0.7.0

@bolcman Can you please share the version details and command you use? Thanks!

bolcman commented 4 years ago

it works! please ignore this , I downloaded the worker 0.7.0 files again and replaced the version i had.