elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
103 stars 4.92k forks source link

Report (log) errors in case globbing logs fails #12186

Closed Mekk closed 1 year ago

Mekk commented 5 years ago

Appetizer

I installed filebeat on MoreImportantMachine. Using the same configuration, which was earlier used on devel. Rather trivial at this time, two specific files, not even wildcard-ed.

Filebeat started and nothing happened. No entries shipped.

I chaotically tried all input settings I could imagine (various ignore* flags). I removed registry state. I restarted filebeat numerous times. Nothing changed.

I quintuple-checked that paths are correct. Yes, they were, I could list and view those files from interactive session.

I tried artificially globbing them. No change.

I enabled all possible log levels. Filebeat logged nothing of interest, generally it looked as if the file did not exist. Leaving apart initial reports about config loading, this is what I got about my file:

2019-05-15T12:21:40.339+0200    INFO    log/input.go:138        Configured paths: [/chroot/uwsgi/log/uwsgi/app-benchmark.log]
2019-05-15T12:21:40.339+0200    DEBUG   [reload]        cfgfile/list.go:101     Starting runner: input [type=log, ID=14892249218949132520]
2019-05-15T12:21:40.339+0200    INFO    input/input.go:114      Starting input of type: log; ID: 14892249218949132520 
2019-05-15T12:21:40.339+0200    DEBUG   [processors]    processors/processor.go:66      Processors: 
2019-05-15T12:21:40.339+0200    DEBUG   [input] log/input.go:174        Start next scan
2019-05-15T12:21:40.339+0200    DEBUG   [input] log/config.go:201       recursive glob enabled
2019-05-15T12:21:40.339+0200    DEBUG   [input] log/input.go:195        input states cleaned up. Before: 0, After: 0, Pending: 0
2019-05-15T12:21:40.339+0200    DEBUG   [input] log/input.go:147        exclude_files: []. Number of stats: 0
2019-05-15T12:21:40.339+0200    DEBUG   [input] log/input.go:168        input with previous states loaded: 0

(and then the same about the other one, and then repeated info about scans not finding anything).

I went down to reading filebeat sources, without noticeable clue, but I started suspect that I get some error after I noticed Glob usage and checked this description: https://golang.org/pkg/path/filepath/#Glob (the second sentence is crucial in particular)

I tried using various related apis (stat, lstat, listdir…) from interactive python. All seemed to work.

What happened

It took someone's illumination oh, there is SELinux on this machine, maybe it interferes. Indeed, after a while we found that some selinux audit.log mentions rejecting filebeat system call (I don't grok those logs too much but it seems it was stat on some parent directory).

As SELinux is configured there to not touch interactive sessions, I could not reproduce any problems interactively.

Suggestion improvements

Fileabeat really should log some information in case globbing/listing files doesn't work.

This can be hard as the true culprit is filepath.Glob from the standard library, which is - IMHO - broken by design (it should return errors if they happen, together with succesfully found items, and leave up to the user decision what to do with them). If I understand correctly I mostly faced this: https://github.com/golang/go/blob/master/src/path/filepath/match.go#L236 which reflects documented decision Glob ignores file system errors such as I/O errors reading directories.

Mayhaps it could be possible to copy this function and fix it so errors are reported (and then logged). Or mayhaps there is some alternative.

At the very least, maybe filebeat could, in case it finds no logs matching the input:

ph commented 5 years ago

I agree this is not a clear message in that scenario, I marked it as a bug.

botelastic[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Mekk commented 3 years ago

So?

The problem is reasonably serious and very confusing: filebeat can ignore existing logs while being completely silent about errors it faces. SELinux by itself is frequently used, and that's not the only scenario possible (plain inaccessible directory would also do, alth. that would be easier to spot).

dplavcic commented 2 years ago

Hi,

this is a really interesting problem. According to the number of opened issues (it seems) it doesn't happen quite often, but when it does it's a lot of fun :sweat_smile:

@Mekk, tnx for the great explanation/description of the issue (and how to fix it). It helped me a lot. In my case, filebeat didn't have enough permissions to read a folder that contained log files to be harvested.

I also agree with proposed solutions. It would be great to have some additional log line to show us that filebeat couldn't read (glob) a file/folder for whatever reason.

filepath/match.go source code show the areas in which any error produced by os.??? call is ignored. I suppose the Go team members had a good reason to implement it like that. Unfortunately, that approach is not that useful in our case 😞

// Glob returns the names of all files matching pattern or nil
// if there is no matching file. The syntax of patterns is the same
// as in Match. The pattern may describe hierarchical names such as
// /usr/*/bin/ed (assuming the Separator is '/').
//
// Glob ignores file system errors such as I/O errors reading directories.
// The only possible returned error is ErrBadPattern, when pattern
// is malformed.
func Glob(pattern string) (matches []string, err error) {
    // Check pattern is well-formed.
    if _, err := Match(pattern, ""); err != nil {
        return nil, err
    }
    if !hasMeta(pattern) {
        if _, err = os.Lstat(pattern); err != nil {
            return nil, nil
        }
        return []string{pattern}, nil
    }
...
// glob searches for files matching pattern in the directory dir
// and appends them to matches. If the directory cannot be
// opened, it returns the existing matches. New matches are
// added in lexicographical order.
func glob(dir, pattern string, matches []string) (m []string, e error) {
    m = matches
    fi, err := os.Stat(dir)
    if err != nil {
        return // ignore I/O error
    }
    if !fi.IsDir() {
        return // ignore I/O error
    }
    d, err := os.Open(dir)
    if err != nil {
        return // ignore I/O error
    }
        ...

The idea of c/p the filepath.Glob function, adding some additional logging to it and placing it under elastic/beats was appealing to me so I tried to implement it.

I didn't want to break any existing functionality, so I search for filepath.Glob function usages in elastic/beats project. There were 51 of them:

Targets
    Occurrences of 'filepath.Glob' in Project
Found Occurrences in Project  (51 usages found)
    Unclassified  (51 usages found)
        beats  (51 usages found)
            auditbeat/module/auditd  (2 usages found)
                config_linux.go  (1 usage found)
                    124 files, err := filepath.Glob(absPattern)
                golden_files_test.go  (1 usage found)
                    182 sourceFiles, err := filepath.Glob(filepath.Join(testDir, testPattern))
            dev-tools/cmd/module_include_list  (1 usage found)
                module_include_list.go  (1 usage found)
                    109 goFiles, err := filepath.Glob(filepath.Join(dir, "*.go"))
            dev-tools/mage  (4 usages found)
                common.go  (1 usage found)
                    617 files, err := filepath.Glob(glob)
                kibana.go  (1 usage found)
                    50 kibanaDirs, err := filepath.Glob(filepath.Join(dir, "*/_meta/kibana"))
                modules.go  (2 usages found)
                    46 shortConfigs, err := filepath.Glob("module/*/_meta/config.yml")
                    126 files, err := filepath.Glob("modules.d/*.disabled")
            dev-tools/packaging  (1 usage found)
                package_test.go  (1 usage found)
                    507 matches, err := filepath.Glob(*files)
            filebeat/config  (1 usage found)
                config.go  (1 usage found)
                    88 files, err := filepath.Glob(path + "/*.yml")
            filebeat/input/file  (1 usage found)
                glob.go  (1 usage found)
                    81 match, err := filepath.Glob(p)
            filebeat/input/filestream  (1 usage found)
                fswatch.go  (1 usage found)
                    313 matches, err := filepath.Glob(path)
            filebeat/input/log  (1 usage found)
                input.go  (1 usage found)
                    311 matches, err := filepath.Glob(path)
            libbeat/cfgfile  (2 usages found)
                glob_manager_test.go  (1 usage found)
                    103 files, err := filepath.Glob(dir + "/*")
                glob_watcher.go  (1 usage found)
                    51 globList, err := filepath.Glob(gw.glob)
            libbeat/common/file  (3 usages found)
                rotator.go  (3 usages found)
                    416 files, err := filepath.Glob(d.filenamePrefix + "*" + d.extension)
                    446 files, err := filepath.Glob(newFileNamePrefix + "*" + d.extension)
                    465 files, err := filepath.Glob(d.filenamePrefix + "*")
            libbeat/common/flowhash  (1 usage found)
                communityid_test.go  (1 usage found)
                    49 pcaps, err := filepath.Glob(filepath.Join(pcapDir, "*.pcap"))
            libbeat/dashboards  (1 usage found)
                importer.go  (1 usage found)
                    119 files, err := filepath.Glob(path.Join(dir, dirType, "*.json"))
            libbeat/processors/script/javascript  (1 usage found)
                javascript.go  (1 usage found)
                    131 matches, err := filepath.Glob(filePath)
            libbeat/publisher/pipeline/stress  (1 usage found)
                stress_test.go  (1 usage found)
                    124 files, err := filepath.Glob(pattern)
            libbeat/statestore/backend/memlog  (1 usage found)
                diskstore.go  (1 usage found)
                    471 files, err := filepath.Glob(filepath.Join(home, "*.json"))
            metricbeat/mb/testing  (1 usage found)
                testdata.go  (1 usage found)
                    155 ff, err := filepath.Glob(filepath.Join(config.Path, "*."+config.Suffix))
            metricbeat/mb/testing/data  (1 usage found)
                data_test.go  (1 usage found)
                    36 configFiles, _ := filepath.Glob(getModulesPath() + "/*/*/_meta/testdata/config.yml")
            metricbeat/module/apache/status  (1 usage found)
                status_test.go  (1 usage found)
                    263 files, err := filepath.Glob("./_meta/test/status_*")
            metricbeat/module/beat/state  (1 usage found)
                data_test.go  (1 usage found)
                    37 files, err := filepath.Glob("./_meta/test/state.*.json")
            metricbeat/module/beat/stats  (1 usage found)
                data_test.go  (1 usage found)
                    37 files, err := filepath.Glob("./_meta/test/stats.*.json")
            metricbeat/module/elasticsearch  (4 usages found)
                testing.go  (4 usages found)
                    38 files, err := filepath.Glob(glob)
                    59 files, err := filepath.Glob(glob)
                    85 files, err := filepath.Glob(glob)
                    112 files, err := filepath.Glob(glob)
            metricbeat/module/elasticsearch/node  (1 usage found)
                node_test.go  (1 usage found)
                    38 files, err := filepath.Glob("./_meta/test/node.*.json")
            metricbeat/module/elasticsearch/shard  (1 usage found)
                data_test.go  (1 usage found)
                    35 files, err := filepath.Glob("./_meta/test/routing_table.*.json")
            metricbeat/module/kibana/stats  (1 usage found)
                data_test.go  (1 usage found)
                    35 files, err := filepath.Glob("./_meta/test/stats.*.json")
            metricbeat/module/logstash/node  (1 usage found)
                data_test.go  (1 usage found)
                    38 files, err := filepath.Glob("./_meta/test/node.*.json")
            metricbeat/module/logstash/node_stats  (1 usage found)
                data_test.go  (1 usage found)
                    38 files, err := filepath.Glob("./_meta/test/node_stats.*.json")
            metricbeat/module/system/raid/blockinfo  (1 usage found)
                parser.go  (1 usage found)
                    152 devices, err := filepath.Glob(filepath.Join(path, "md", "dev-*"))
            metricbeat/scripts/mage  (6 usages found)
                docs_collector.go  (3 usages found)
                    244 metricsetList, err := filepath.Glob(filepath.Join(modulePath, "/*"))
                    444 modules, err := filepath.Glob(beatsModuleGlob)
                    451 xpackModules, err := filepath.Glob(xpackModuleGlob)
                package.go  (3 usages found)
                    162 shortConfigs, err := filepath.Glob("module/*/_meta/config.yml")
                    166 flavorConfigs, err := filepath.Glob("module/*/_meta/config-*.yml")
                    249 matches, err := filepath.Glob(filepath.Join(path, pattern))
            x-pack/auditbeat/module/system/login  (1 usage found)
                utmp.go  (1 usage found)
                    128 paths, err := filepath.Glob(filePattern)
            x-pack/elastic-agent/pkg/agent/program  (2 usages found)
                program_test.go  (1 usage found)
                    498 useCases, err := filepath.Glob(filepath.Join(useCasesPath, "*.yml"))
                spec.go  (1 usage found)
                    51 files, err := filepath.Glob(path)
            x-pack/elastic-agent/pkg/dir  (1 usage found)
                discover.go  (1 usage found)
                    18 f, err := filepath.Glob(pattern)
            x-pack/elastic-agent/pkg/packer  (1 usage found)
                packer.go  (1 usage found)
                    31 files, err := filepath.Glob(p)
            x-pack/filebeat/input/awss3  (1 usage found)
                script.go  (1 usage found)
                    104 matches, err := filepath.Glob(filePath)
            x-pack/filebeat/input/netflow  (1 usage found)
                netflow_test.go  (1 usage found)
                    64 pcaps, err := filepath.Glob(filepath.Join(pcapDir, "*.pcap"))
            x-pack/winlogbeat/module  (1 usage found)
                testing_windows.go  (1 usage found)
                    54 files, err := filepath.Glob(evtx)

Considering how filepath.Glob is used, I concluded that adding some logging to filepath.Glob function should do it. I've copy-pasted the filepath.match file that contains the Glob method to a new file under the elastic/beats domain. After that I've added a couple of log lines to show us errors that can happen while doing the filepath.Glob call.

The process was straightforward for almost everything:

NOTE:

After some time, the idea of maintaining our own copy of the filepath.match file was not that appealing anymore 😆

I wasn't able to find anything about this issue in official documentation (am I missing something?).

Since this doesn't happen often, I think the good enough solution is to add some additional info/warning to the filebeat inputs docs. A note/warning that a filebeat need to have a right permission level (file/folder/syscalls...) on the files it wants to harvest.

I will update the docs and create the PR during the day (or tomorrow).

@Mekk, what do you think about that?

Edit:

There is an idea to implement "prevalidation" in terms of issuing os.??? calls to see if file/folder is accessible. But if wildcard pattern is used, I have to first call the Glob() function to get the files that match the wildcard patterns. After that I would have to do the validation on the returned files, which makes no sense since "unaccessible" files are already filtered out by Glob() function call.

The only thing, regarding the prevalidation that I can think of is in case when no wildcard pattern is used, but file names are explicitly specified. In that case I could try to os.??? the file to see if I have the right permissions to access it. But this seems like a "half baked" solution since it doesn't cover the wildcard usages 🤔

Mekk commented 2 years ago

Let me remind what happens: existing files matching filebeat rule are completely ignored without any, even smallest, hint that something wrong happens. This is not inconvenience, this is possibly important problem (you don't get logs and don't know about this). In my case it was single machine and I knew I have the problem as I did not see them. With numerous machines, one can completely miss such problem for a long time.

Docs remark is not sufficient. Error should result in some reported error (preferably up to „failed state” visible on stack monitoring etc), but at least clear log entry is necessary.

Regarding practical approach – if maintaining patched glob/match (which need not be used everywhere) is too hard, at the very least I would simply check whether parent directory of glob can be listdir-ed (if glob doesn't have wildcards or have them only in filename) or whether directory part resolves to anything (if dirs are globbed).

botelastic[bot] commented 1 year ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!