elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.05k stars 4.89k forks source link

Optimizing Filebeat for Reading Only New Text Files #39970

Open Micheal-Madhan opened 1 week ago

Micheal-Madhan commented 1 week ago

After the Filebeat setup is completed through the MSI, Filebeat takes too long and uses too much CPU for scanning all text files, including older files. We never delete the logs folder after uninstalling the Filebeat MSI. I wonder why it reads the existing text files again, which were already scanned by Filebeat. Could you please advise if there is an option to avoid reading the existing text files and instead read only the new text files

### Tasks
- [ ] https://github.com/elastic/beats/pull/39744
darwinSK commented 1 week ago

Filebeat, by default, attempts to monitor all files within the specified paths in its configuration, including older files, which can lead to high CPU usage and longer scanning times. To configure Filebeat to read only new text files and avoid re-reading existing files that were already scanned, you can use a combination of configuration options.

Here are some steps and configurations you can apply to optimize Filebeat's performance:

1. Registry File

Filebeat maintains a registry file that keeps track of the state of files it has already read. Ensure that the registry file is not deleted when uninstalling or reinstalling Filebeat. This file is typically located at C:\ProgramData\filebeat\registry\filebeat on Windows.

2. Configure the Prospectors

Adjust your Filebeat configuration to specify file input settings carefully. Here’s an example configuration in the filebeat.yml file to ensure Filebeat focuses on new files:

Example filebeat.yml Configuration:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\path\to\your\logs\*.log
  # Ignore older files
  ignore_older: 24h  # Adjust the time period according to your needs

  # To handle large files, configure scan_frequency and clean_inactive
  scan_frequency: 10s  # How often to scan for new files

  # Clean files older than this time period from the registry
  clean_inactive: 48h  # Adjust the time period according to your needs

# Ensure that the registry file is kept
filebeat.registry.path: "C:/ProgramData/filebeat/registry"

3. Ignore Older Files

The ignore_older setting ensures that Filebeat ignores files older than the specified duration. This helps reduce the load by preventing Filebeat from scanning and processing old log files.

4. Scan Frequency

The scan_frequency setting controls how often Filebeat scans for new files. Reducing the scan frequency can also help decrease CPU usage.

5. Clean Inactive Files

The clean_inactive setting removes state entries from the registry file for files that are older than the specified duration. This ensures that the registry file does not grow indefinitely and helps Filebeat focus on new files.

6. Delete Older Log Files

Regularly clean up old log files from the directory if they are no longer needed. This helps in reducing the number of files Filebeat needs to scan.

7. Filebeat Modules

If applicable, consider using Filebeat modules designed for specific log types. Modules are preconfigured to handle specific log formats and can optimize performance for those types of logs.

Implementation Steps:

  1. Edit the Configuration File: Open the filebeat.yml configuration file located in the Filebeat installation directory.
  2. Modify Settings: Apply the settings as shown in the example above.
  3. Restart Filebeat: After modifying the configuration, restart the Filebeat service to apply the changes.
# Restart Filebeat on Windows
Restart-Service filebeat

Troubleshooting:

By applying these configurations and practices, you can optimize Filebeat to read only new text files and reduce CPU usage, thereby improving overall performance.

elasticmachine commented 1 week ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)