In the indexing job, documents being pushed again, Spark UI first finishes indexing, and after the stage status and information disappear displaying as 'Pending' task, Spark launches the indexing stage again, resulting in duplicate index data.
On the image Stage 4 was actually complied and data was indexed. Stage 5 started to push the same data to the index. Looks like driver doesn't get correct stage status.
In the indexing job, documents being pushed again, Spark UI first finishes indexing, and after the stage status and information disappear displaying as 'Pending' task, Spark launches the indexing stage again, resulting in duplicate index data.
On the image Stage 4 was actually complied and data was indexed. Stage 5 started to push the same data to the index. Looks like driver doesn't get correct stage status.