gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

K8s: Incorrect pipelines metrics #1027

Closed muttcg closed 3 months ago

muttcg commented 5 months ago

https://registry.gbif-dev2.org/dataset/06a00852-f764-4fb8-80d4-ca51f0918459/ingestion-history

DWCA_TO_VERBATIM : 45,119 INTERPRETED_TO_INDEX : 67,641

Actual index size : 45,119

muttcg commented 5 months ago

Beam only partially supports metrics for Spark, and the attempted indexing count is completely off. I have replaced the Beam count with the Elasticsearch count.

muttcg commented 4 months ago

Can be related to #1042