Open colinsurprenant opened 4 years ago
We could introduce 2 new metrics:
batch.size
is 1000 this counter for this page is 5batch.size
If the ratio between the first and the second goes down to less then 2 or 3 we should discover such a problem.The optimal solution is too pull as much data as the filter section requires, loading/mapping all the pages that we need to accomplish this; however this could lead to pressure the paging part and memory, loading a lot of data from disk.
Good suggestions @andsel. Supporting multi-pages reads will not be simple to do and I am not sure about the real benefit of supporting this versus tuning the batch size and the page size according to the data shape (given better insights). In fact, I have not seen very large batch sizes to offer performance improvement.
If this will take a while to fix properly, we could add info in the Troubleshooting section or Best Practices section of the docs. Let me know if that makes sense.
For almost-empty, we should differentiate between and almost-empty batch resulting from a time-out waiting for enough events to fill a batch (low volume) and an almost-empty batch resulting from the tail end of a page.
Good point @yaauie , in the case we want to warned, it happens that the final pointer of the data are always at the 64Mb edge limit, while for slow writer the final pointer of the retrieved data, is seldom near the end of the page. So our "almost-empty" counter definition could be: pull data from the queue retrieving less data then batch.size
and the pull consumes the page.
We can count, per pipeline:
From batches.total - batches.timed_out - batches.full
we can know how many were cut off due to a page boundary.
This is actually a solid starting point if we ever decide to do adaptive batch sizes, either by event count or event size. With the memory constraints in mind, batch sizes can usually be increased up until batches.timed_out
starts increasing.
@yaauie yes agreed, good point, I did not have the almost empty use-case in mind.
@jsvd I like that! Seems like these metrics should provide all the visibility needed to understand the dynamic of batch sizes vs page size etc. Adaptive batch size might be too much of a step IMO because there are other constrains to an optimal batch size than just its PQ related behaviour but we could definitely either log some hints or derive some performance hints in the metrics UI for example.
The Problem
By default PQ uses
queue.page_capacity: 64mb
which, in our experience, should not be changed and proved to be a good balance in terms of performance and vm/mmap IO pressure.That said, a 64mb page limits the number of events a single PQ data page can hold and workers reading from PQ will try to maximise batch sizes upto the configured
pipeline.batch.size
BUT a batch will never contain events across pages. For example, if a page holds 1000 events and thebatch.size
is set to 5000, only 1000 events will be returned from the PQ read operation and the batch will have a size of 1000.This should not generally be a concern when using the default small
pipeline.batch.size: 125
; in most contexts, there will be orders of magnitude more events in a single page.But for configurations using a very large
batch.size
, that we typically see when users might want to use a large(r) bulk size when indexing using the elasticsearch output plugin (which uses the configuredbatch.size
as the indexing bulk size), we could end in a situation where thebatch.size
is similar or event bigger than to number of events a single PQ page can hold. This can lead to 2 potential problems:batch.size
then a first PQ read operation will return a full batch but the next read operation will return small batch with only the remaining unread events in that page.batch.size
then each read operation will return less events and if changes are made to increase thebatch.size
, it will not have any effect and not translate in larger bulk size for exemple when using the elasticsearch output.How to Diagnose
Currently one way to see ho many events are into each pages is by running
bin/pqcheck
when logstash is not running and look at theelementCount=XXX
on each page, unless data shape changes a lot, this number should be similar across pages.Ideally a single page should hold a good multiple of
batch.size
events, probably a minimum of around 5X.Example Sizes
Some examples:
Suggestion
I think we should find a way to report the PQ read operations in relation to the
batch.size
when pages holds a very small multiple ofbatch.size
event or less thanbatch.size
. I am not sure that a systematic WARN log is a good idea since it risks flooding the logs. Maybe read sizes could be reported in monitoring?