RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
935 stars 368 forks source link

Seeing an issue with Spark-Redis not calling shutdown hook after the job completes. #201

Open agrimn opened 4 years ago

agrimn commented 4 years ago

I am using the example provided in the Java docs and running this on a local spark cluster.

public void run() throws Exception {
    SparkSession sparkSession = SparkSession.builder()
        .master("local")
        .config("spark.redis.host", redisHost)
        .config("spark.redis.port", redisPort)
        .config("spark.redis.db", dbName)
        .getOrCreate();

    Dataset<Row> df = sparkSession.createDataFrame(Arrays.asList(
        new Person("John", 35),
        new Person("Peter", 40)), Person.class);

    df.write()
        .format("org.apache.spark.sql.redis")
        .option("table", "person")
        .option("key.column", "name")
        .mode(SaveMode.Overwrite)
        .save();

   sparkSession.stop();
}

This writes to redis successfully but does not actually complete the spark job and call the shutdown hooks. This stops at:

19/09/19 17:21:02 INFO o.a.s.SparkContext: Successfully stopped SparkContext

I am ideally, expecting this job to complete so that I can proceed with my DAG, with:

19/09/19 17:24:57 INFO o.a.s.u.ShutdownHookManager: Shutdown hook called 19/09/19 17:24:57 INFO o.a.s.u.ShutdownHookManager: Deleting directory /private/var/folders/kk/gtt0h1mx46s78vy_wsfdv6r00000gp/T/spark-e4fe17c4-8342-4788-a78a-4fdf73ed2da3

Using Spark-redis version: 2.4

fe2s commented 4 years ago

Hi @agrimn ,

Do you submit the jar or run it from your IDE? I tried to run your code from IDE and getting the following that seems to be fine.

19/09/20 23:09:55 INFO DAGScheduler: Job 1 finished: save at JavaDataFrameTest.java:72, took 0.105209 s
19/09/20 23:09:55 INFO SparkUI: Stopped Spark web UI at http://192.168.0.100:4040
19/09/20 23:09:55 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/09/20 23:09:55 INFO MemoryStore: MemoryStore cleared
19/09/20 23:09:55 INFO BlockManager: BlockManager stopped
19/09/20 23:09:55 INFO BlockManagerMaster: BlockManagerMaster stopped
19/09/20 23:09:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/09/20 23:09:55 INFO SparkContext: Successfully stopped SparkContext
19/09/20 23:09:55 INFO ShutdownHookManager: Shutdown hook called
19/09/20 23:09:55 INFO ShutdownHookManager: Deleting directory /private/var/folders/5p/j__gnr6s28n_1pd8_v_xzsww0000gn/T/spark-d3ffc8bf-afbd-479c-a80a-70113f5b48d4
jovv commented 4 years ago

I'm running into the same problem, the issue is with spark-submit, not running from the IDE. Spark version is 2.4.3 (with Scala), spark-redis version 2.4.0.

joacosnchz commented 2 years ago

I'm having the same issue with spark-mongodb connector. Did you find any workaround?

jgournet commented 3 months ago

I'm having this as a rare intermittent occurrence too - I'd be really interested to know if there is a fix or a work around