Open robbavey opened 1 year ago
As i'm still learning these concepts, can I extend your example to check my understanding of this proposed improvement for you to review before moving forward with an implementation?
In your example we configure two pipelines with persistent queues. The relevant config settings are: PQ1
queue.max_bytes: 300gb
queue.path: /Users/robbavey/logstash-8.5.0/data/queue/test1
PQ2
queue.max_bytes: 300gb
queue.path: /Users/robbavey/logstash-8.5.0/data/queue/test1
The LogStash::PersistetedQueueConfigValidator#check
https://github.com/elastic/logstash/blob/046ea1f5a8a5e26f82d1495bb2ed9a7a3fe96332/logstash-core/lib/logstash/persisted_queue_config_validator.rb#L36-L66 method is used for computing resource requirements relevant for this warning.
Currently for each of these configs we read in the max_bytes
from the config and use the path
to determine what filesystem that queue will occupy. In our case both /Users/robbavey/logstash-8.5.0/data/queue/test1
and /Users/robbavey/logstash-8.5.0/data/queue/test2
will occupy /dev/disk1s2
. Given we have set the max_bytes
greater than the capacity of /dev/disk1s2
we add to the total for that filesystem https://github.com/elastic/logstash/blob/046ea1f5a8a5e26f82d1495bb2ed9a7a3fe96332/logstash-core/lib/logstash/persisted_queue_config_validator.rb#L63-L65.
There are several shortcomings for this approach:
Proposed improvement: Here is a proposed example warning
The `max_bytes` allocated for persistent queues for filesystem '/dev/dist1s2' exceed available space.
'/dev/dist1s2' filesystem status:
- Total space required: 600gb
- Currently free space: 312gb
- Current PQ usage: 50gb
- Additional space needed: 288gb
Individual queue requirements on '/dev/dist1s2' filesystem:
/Users/robbavey/logstash-8.5.0/data/queue/test1:
Current size: 30gb
Maximum size: 300gb
/Users/robbavey/logstash-8.5.0/data/queue/test1:
Current size: 20gb
Maximum size: 300gb
Please either:
1. Free up disk space
2. Reduce queue.max_bytes in your pipeline configurations
3. Move PQ storage to a filesystem with more available space
Note: Logstash may fail to start if this is not resolved.
What this would involve is passing through all the configs to build up all the required max_bytes
and find all the paths
. Once we have this information we can partition by the file systems and build a warning (in our example it is just a single filesystem, but there could be several if paths
are configured on multiple filesystems).
Implementation notes: I think from reviewing the existing methods in that class we could compute the space available on a file system as well as the space used under a given queue path
. This should allow us to distinguish between what we are explicitly using for logstash queue storage vs what may be used by other entities on the system. I see this util module https://github.com/elastic/logstash/blob/046ea1f5a8a5e26f82d1495bb2ed9a7a3fe96332/logstash-core/lib/logstash/util/byte_value.rb#L57-L73 which i think would be nicer for human readable bytes.
logstash version
Logstash 7.x >= 7.17.5 Logstash 8.x >= 8.3.0
Steps to reproduce:
When starting up, Logstash will check the total amount of space required for PQ's on a specified file system against the amount of disk left on that file system, logging a warning when the total amount of space is exceeded.
However, the warning message emitted is difficult to follow and provide the correct remediating action:
I set up a config on my laptop where I have 312Gi free on my local drive, and configured two pipelines, each with
configured and started up logstash.
I received the following warning message:
This number -
Please free or allocate 643171352576 more bytes.
- feels a little confusing as I actually need fewer bytes than that to successfully allow the PQ's to operate.The number appears to
(Total Size of required disk across all PQ) - (disk used across all PQ)
But the disk may not be dedicated to PQ and the number may be misleading.
It may be more useful to report
It may also be worth strengthening the warning to state that Logstash may fail to start if this is not resolved