NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Tweak _bulk flush threshold #46

Closed alexdunnjpl closed 11 months ago

alexdunnjpl commented 11 months ago

💡 Description

Currently, _bulk updates are flushed every 5000 products.

Per Elastic, the actual metric of interest is size.

Start with a bulk size around 5–15 MB and slowly increase it until you do not see performance gains anymore. Then start increasing the concurrency of your bulk ingestion (multiple threads, and so forth).

Fine-tuning will be somewhat annoying and take time, but some coarse tuning should yield meaningful impact with minimal effort.