gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

K8s: Spark job submit finished stage twice #1042

Closed muttcg closed 3 months ago

muttcg commented 3 months ago

In the indexing job, documents being pushed again, Spark UI first finishes indexing, and after the stage status and information disappear displaying as 'Pending' task, Spark launches the indexing stage again, resulting in duplicate index data.

On the image Stage 4 was actually complied and data was indexed. Stage 5 started to push the same data to the index. Looks like driver doesn't get correct stage status.

Screenshot from 2024-03-07 15-43-55