Logging changes to support Beats Elasticsearch module (including logging to JSON)

jasontedor commented 6 years ago

This relates to the effort in Beats (filebeat) to build a module for Elasticsearch. To support this effort, we want to make some changes to the logs in Elasticsearch including removing some existing pain points. The main highlight of these changes will be shipping with JSON logging.

[ ] Change the name of the main log file to something like server.log. Today the main log file is <clustername>.log and this creates complexity for Filebeat; it would be simpler for them if they could rely on an include pattern based on a predictable log file name (another possibility here is server-<clustername>.log). Note that this will be a breaking change.
[ ] Each log line should contain the node ID and cluster ID (machine friendly, and serve as unique identifiers), and node name and cluster name (human friendly). Filebeat wants to see these for instances where logs are being collected to a single sink from multiple nodes and multiple clusters.
[ ] Improve logging in the official Docker images. Logging in Docker images traditionally goes to standard output and standard error. We do this with our official Docker images today, but then there is no way to distinguish the server logs from the audit, deprecation, and slow logs. With a single output stream, we need to add a "type" field to the log outputs from the Docker container so the different sources can be parsed from the single standard output stream.
[ ] Enable JSON layout. Today we do not support layout, mainly due to security manager issues. We need to address these. With this effort, we want the logs to remain human friendly, so we will also want to define an ordering of the first few fields in the JSON object (something like timestamp, log_level, component, node_name, index_name, shard_id, message).

elasticmachine commented 6 years ago

Pinging @elastic/es-core-infra

jasontedor commented 6 years ago

Pinging @ruflin; let me know if I missed anything!

sshling commented 6 years ago

In some scenarios, an exception occurs, a large amount of logs will be printed, tens of MB per second, if it can be optimized, the same type exception is suppressed ,as print info :

"2016-07-09 11:01:31 +0430 [warn]: suppressed same stacktrace"

A large number of exception log writes, seriously affecting disk write performance.

ycombinator commented 5 years ago

Now that we have a Logging UI in Kibana, we were hoping to show ES deprecation (and potentially other types of) logs in the UI. The user would discover said logs via the Monitoring UI, filtered down to the context they are in: a cluster, a node within a cluster, etc. In order for this to work correctly, item 2 in the checklist above needs to be implemented:

Each log line should contain the node ID and cluster ID (machine friendly, and serve as unique identifiers), and node name and cluster name (human friendly). Filebeat wants to see these for instances where logs are being collected to a single sink from multiple nodes and multiple clusters.

Any chance this specific item could be prioritized to enable the Monitoring UI use case described above?

jasontedor commented 5 years ago

Any chance this specific item could be prioritized to enable the Monitoring UI use case described above?

We have been meaning to follow-up on this requirement, because as written we can’t meet it. We need to understand more how important and how strong this requirement is. It is simply not possible include the node ID and cluster ID on every log line. We emit log lines before these data are available.

ycombinator commented 5 years ago

We emit log lines before these data are available.

Is this true for all types of logs? For example, would it be possible to emit the node ID and cluster ID on every log line for deprecation logs?

Are there any guarantees about when the data becomes available or, conversely, what types of log messages might not have this data in them since it's not available yet?

jasontedor commented 5 years ago

Yes, it could be true for the deprecation log too. Also, it would be that we are considering combining all the log files into a single log file for all instances, not only for the Docker images.

Are there any guarantees about when the data becomes available or, conversely, what types of log messages might not have this data in them since it's not available yet?

There aren't guarantees per se. They can be categorized as all startup log messages up to the point that we recover node state would not have the node ID, and all startup log messages up to the point that we recover cluster state would not have the cluster ID (superset of those without the node ID).

ycombinator commented 5 years ago

Thanks Jason. Roughly how many log entries are we talking about without cluster UUID or node ID? The reason I ask is: thinking a bit out of the box, would it be silly to re-emit those messages with the IDs in them, once the IDs become available? Just want to consider all possibilities before we say it's okay to "drop" certain messages that don't have IDs.

jasontedor commented 5 years ago

Roughly how many log entries are we talking about without cluster UUID or node ID?

I can not bound it because we can always add more in the future, and if debug or trace logging is enabled (or even only for certain components) it can be quite a lot of log messages.

ycombinator commented 5 years ago

In that case, I'm in favor of doing a best effort here. That is, I'm in favor of ES including the cluster UUID and node ID on every log line as and when that information becomes available.

It would be an improvement over what we can do today in the UI with the logs in terms of correlating them with other information about the node or cluster. And we'd still have the un-correlatable logs (but fewer in number now) around in filebeat indices, in case we later think of some way to display these meaningfully to the user.

pgomulka commented 5 years ago

I opened a draft PR to discuss the technical details of the implementation. Feel free to review/comment.

In regards to the format itself. The idea would be to change the log line from:

[2018-12-19T11:10:33,609][INFO ][o.e.l.LicenseService     ] [node-0] license [1d313428-f9d3-4085-81e0-c77f032359a1] mode [basic] - valid

To:

{"type": "rolling", "timestamp": "2018-12-19T12:18:58,402", "level": "INFO ", "class": "o.e.l.LicenseService", "cluster_name": distribution_run", "node_name": "node-0", "cluster_uuid": "dlQqhmTYQ3OfoF0eLIdsCw", "node_id": "uy-oB2QFS1m73g8Gg9yAXw", "message": "license [3cbed08b-c198-4b3d-b7d6-62efb94c1c2e] mode [basic] - valid"}

Before the first ClusterStateUpdate is received, the cluster_id and node_id would not be present in log lines:

{"type": "rolling", "timestamp": "2018-12-19T12:18:57,525", "level": "INFO ", "class": "o.e.t.TransportService", "cluster_name": distribution_run", "node_name": "node-0",  "message": "publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}"}

Or can they be null? like: "cluster_uuid": null, "node_id": null

In order to support multiple output streams from docker we will have a type field that indicate what appender is the log from. For instance from deprecation log:

{"type": "deprecation_rolling","timestamp": "2018-12-19T11:24:24,949", "level": "WARN ", "class": "o.e.d.l.RestPostStartTrialLicense", "node_name": "node-0", "node_id": "MGKz9QIFTyS-IR85E5J2Fg", "nodeid2": "unknown_node_id", "message": "[POST /_xpack/license/start_trial] is deprecated! Use [POST /_license/start_trial] instead."}

I am stil not sure how to format stacktraces in the log lines. Should they also contain the fields mentioned in 1st post by Jason?

pgomulka commented 5 years ago

also, should we allow users to change the pattern we create now in a JSON form? I could imagine some users would be tempted to do this and we don't have a way to control how they change it and if they follow the json structure.

ycombinator commented 5 years ago

Hi @pgomulka this is looking great!

I am stil not sure how to format stacktraces in the log lines. Should they also contain the fields mentioned in 1st post by Jason?

You mean because they are multiline? If so, what about escaping the newlines in the message field in the JSON?

ycombinator commented 5 years ago

I have a request for one more logging change to support the Beats Elasticsearch module. Please let me know if I should add it as a checklist item in this issue's description OR if you'd prefer that I make a new issue OR if this comment is sufficient.

Currently we see lines like these in the Elasticsearch logs:

[2018-07-03T11:45:45,604][WARN ][o.e.m.j.JvmGcMonitorService] [srvmulpvlsk252_md] [gc][3449992] overhead, spent [1.6s] collecting in the last [1.8s]

Notice the durations in these lines, 1.6s and 1.8s. Would it be possible to normalize the unit here, perhaps to ms? That would make parsing these durations a lot easier.

pgomulka commented 5 years ago

@ycombinator in regards to this duration units inconsistency, that might be a bug I think. Can you please a raise a separate issue?

ycombinator commented 5 years ago

@pgomulka Done! https://github.com/elastic/elasticsearch/issues/36896

pgomulka commented 5 years ago

@ycombinator In regards to exception formatting. I am thinking of a format like this:

{"type": "console", "timestamp": "2018-12-27T14:45:10,386", "level": "WARN ", "class": "r.suppressed", "cluster_name": "distribution_run", "node_name": "node-0", "cluster_uuid": "XSKjGHufRR25tdwlkpCTig", "node_id": "iP2awnO9QX6ajH2z6w1byQ",  "message": "", "path": "/myindex", "params": "{index=myindex}", "exception": ["java.lang.NullPointerException: asdf",
"at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validateIndexName(MetaDataCreateIndexService.java:149) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validate(MetaDataCreateIndexService.java:596) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$IndexCreationTask.execute(MetaDataCreateIndexService.java:287) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:685) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:211) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:143) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:834) [?:?]"] }

To make a stacktrace human readable in JSON format I have enclosed each line of stacktrace in a separate array element. I hope this should still be easy for beats to parse. Let me know what you think

pgomulka commented 5 years ago

Also I am wondering if we should add a new schema to ECS. We are after all integrating beats with ES. We have log.yaml schema at the moment, I was thinking of having a schema with all fields we will use:

type;
timestamp;
level;
component;
cluster.name;
node.name;
cluster.uuid;
node.idd;
message;
Array exception;

pgomulka commented 5 years ago

Another problem to consider is how should we handle exceptions, messages written directly to the console. That is not a problem when running elasticsearch from a binary - log files will be parsed - but might be an issue when running from docker.

For instance if elasticsearch fails bootstrap checks we will write the exception message directly to the org.elasticsearch.cli.Terminal

 terminal.println(Terminal.Verbosity.SILENT, "ERROR: " + e.getMessage());

In Boostrap.java we remove ConsoleAppender, expecting this to print out from Terminal again later.

   // disable console logging, so user does not see the exception twice (jvm will show it already)
            final Logger rootLogger = LogManager.getRootLogger();
            final Appender maybeConsoleAppender = Loggers.findAppender(rootLogger, ConsoleAppender.class);
            if (foreground && maybeConsoleAppender != null) {
                Loggers.removeAppender(rootLogger, maybeConsoleAppender);
            }

but since docker is only using ConsoleAppender, it will miss the message in json format and will later only print out the non-json message on a Terminal

Logs from docker: {"type": "console", "timestamp": "2019-01-04T16:23:22,319+0100", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "elasticsearch", "node.name": "Przemyslaws-MacBook-Pro.local", "message": "explicitly enforcing bootstrap checks" } ERROR: [1] bootstrap checks failed [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.zen.ping.unicast.hosts, discovery.zen.hosts_provider, cluster.initial_master_nodes] must be configured {"type": "console", "timestamp": "2019-01-04T16:23:22,339+0100", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elasticsearch", "node.name": "Przemyslaws-MacBook-Pro.local", "message": "stopping ..." }

Logs from log file

{"type": "console", "timestamp": "2019-01-04T17:12:03,836+0100", "level": "ERROR", "component": "o.e.b.Bootstrap", "cluster.name": "distribution_run", "node.name": "node-0",  "message": "node validation exception\n[1] bootstrap checks failed\n[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.zen.ping.unicast.hosts, discovery.zen.hosts_provider, cluster.initial_master_nodes] must be configured"  }

Should we aim to format Terminal messages as well?

chrisronline commented 5 years ago

Hey folks

I'm hoping to get some traction on this issue. We're interested in pushing forward this deprecation log in the monitoring UI initiative and we've discussed that we need cluster_uuid in the deprecation log in order to move forward, as we can't definitely know which cluster that deprecation log appeared on.

Any update here?

pgomulka commented 5 years ago

@chrisronline the PR #36833 with ES changes is on track to be merged into 7.0. This will have cluster.uuid, but as discussed earlier only once the ES started up. see comment

you can see sample messages in here

chrisronline commented 5 years ago

@pgomulka Thanks! That's helpful for sure!

We're hoping to get our side done before 7.0 so users can take advantage of the UI for the 7.0 upgrade. Is it possible to tackle this work in phases, where the final phase is the PR you have linked, but we can have an initial phase where we add the cluster_uuid to the deprecation log in the current format?

pgomulka commented 5 years ago

@chrisronline in my view it would be better to push that all together. The majority of the work is related to propagating cluster.uuid and node.id and testing. Most of the test I wrote already assume the json format, so that would be actually more work now to isolate that change Unless that would mean that the json format itself would be postponed?

@danielmitterdorfer @nik9000 any views on this?

danielmitterdorfer commented 5 years ago

I agree with @pgomulka that we should push this together considering that the PR is already in review.

pgomulka commented 5 years ago

@chrisronline @albertzaharovits Should we consider unite security audit logs and Elasticsearch logs to some degree? Seems like cluster.uuid field is not present in audit logs: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/core/src/main/config/log4j2.properties Chris, is monitoring UI going to look into these information as well?

@ycombinator is Beats going to be happy if those logs are different?

ycombinator commented 5 years ago

Hi @pgomulka, when you say "unite", I assume you mean that they should have some common fields like cluster.uuid and node.uuid. That is definitely desirable because we intend to show all different types of logs in the Monitoring UI per cluster and per node.

Purely from Filebeat's perspective it's okay if the logs are different since each log type (which corresponds to a fileset in Filebeat) provides its own piece of the ES mapping. Of course, what we wouldn't want to do is have a field called cluster_uuid in one log and cluster.uuid in another, both meaning the same thing. But AFAICT, that's not the case — all logs that have cluster UUID in them use cluster.uuid.

pgomulka commented 5 years ago

@rjernst raised an important problem that the readability is greatly affected with this change. We can work that around with keeping the old logs and new logs together. Old logs still using previous pattern and *.log filetype suffix. New logs on the other hand would use a new pattern and *.json file type suffix. That might be concerning that there will be duplicated amount of logs (and files), but on the other side we don't log a massive amounts anyway.

Any views on this?

ycombinator commented 5 years ago

Old logs still using previous pattern and .log filetype suffix. New logs on the other hand would use a new pattern and .json file type suffix.

Just wanted to point out that the audit log is not following this convention currently: plaintext logs are being sent to {clustername}_access.log and JSON logs are being sent to {clustername}_audit.log. Starting 7.0 they plan to just emit the JSON logs IIRC. /cc @albertzaharovits

dliappis commented 5 years ago

raised an important problem that the readability is greatly affected with this change. We can work that around with keeping the old logs and new logs together. Old logs still using previous pattern and *.log filetype suffix. New logs on the other hand would use a new pattern and *.json file type suffix. That might be concerning that there will be duplicated amount of logs (and files), but on the other side we don't log a massive amounts anyway.

Any views on this?

I am a bit skeptical about this for the following reasons:

There could be customers directly scrapping/ingesting the entirety of logs/ and possibly end up with duplicate logs in their log environment.
We'll use double the amount of space needed for logs. Our own logs are modest, but what about misbehaving plugins for example?

For the record, with the introduction of json logging, at least the docker use case looks really nice and easy to read esp. piping it to jq with something as simple as:

docker logs <container_name> | jq . producing very readable output IMHO like:

One concern though is the possible amount of frustration with new users not preparing themselves for this switch and frantically looking on how to change back to normal log output.

I chatted with @pgomulka earlier and suggested whether it makes sense to ship the log4j2.properties files that #36833 bring, clearly marked with comments and then an additional section with the normal old-style logging, fully commented out, ready to be uncommented out.

For example:

Suggestion for log4j2.properties (docker)

``` ################################################################################ # JSON (DEFAULT) LOGGING DEFINITIONS START HERE ################################################################################ status = error # log action execution errors for easier debugging logger.action.name = org.elasticsearch.action logger.action.level = debug appender.rolling.type = Console appender.rolling.name = rolling appender.rolling.layout.type = ESJsonLayout appender.rolling.layout.type_name = server rootLogger.level = info rootLogger.appenderRef.rolling.ref = rolling appender.deprecation_rolling.type = Console appender.deprecation_rolling.name = deprecation_rolling appender.deprecation_rolling.layout.type = ESJsonLayout appender.deprecation_rolling.layout.type_name = deprecation logger.deprecation.name = org.elasticsearch.deprecation logger.deprecation.level = warn logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling logger.deprecation.additivity = false appender.index_search_slowlog_rolling.type = Console appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling appender.index_search_slowlog_rolling.layout.type = ESJsonLayout appender.index_search_slowlog_rolling.layout.type_name = index_search_slowlog logger.index_search_slowlog_rolling.name = index.search.slowlog logger.index_search_slowlog_rolling.level = trace logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling logger.index_search_slowlog_rolling.additivity = false appender.index_indexing_slowlog_rolling.type = Console appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling appender.index_indexing_slowlog_rolling.layout.type = ESJsonLayout appender.index_indexing_slowlog_rolling.layout.type_name = index_indexing_slowlog logger.index_indexing_slowlog.name = index.indexing.slowlog.index logger.index_indexing_slowlog.level = trace logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling logger.index_indexing_slowlog.additivity = false ################################################################################ # JSON (DEFAULT) LOGGING DEFINITIONS END HERE ################################################################################ ################################################################################ # # UNCOMMENT THE FOLLOWING SECTION AND COMMENT OUT THE ABOVE SECTION TO REVERT # TO <7.0 DEFAULT LOGGING BEHAVIOR. # # TEXT LOGGING DEFINITIONS START HERE # ################################################################################ # # status = error # # # log action execution errors for easier debugging # logger.action.name = org.elasticsearch.action # logger.action.level = debug # # appender.rolling.type = Console # appender.rolling.name = rolling # appender.rolling.layout.type = PatternLayout # appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n # # rootLogger.level = info # rootLogger.appenderRef.rolling.ref = rolling # # appender.deprecation_rolling.type = Console # appender.deprecation_rolling.name = deprecation_rolling # appender.deprecation_rolling.layout.type = PatternLayout # appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %.-10000m%n # # logger.deprecation.name = org.elasticsearch.deprecation # logger.deprecation.level = warn # logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling # logger.deprecation.additivity = false # # appender.index_search_slowlog_rolling.type = Console # appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling # appender.index_search_slowlog_rolling.layout.type = PatternLayout # appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] [%node_name]%marker %.-10000m%n # # logger.index_search_slowlog_rolling.name = index.search.slowlog # logger.index_search_slowlog_rolling.level = trace # logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling # logger.index_search_slowlog_rolling.additivity = false # # appender.index_indexing_slowlog_rolling.type = Console # appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling # appender.index_indexing_slowlog_rolling.layout.type = PatternLayout # appender.index_indexing_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] [%node_name]%marker %.-10000m%n # # logger.index_indexing_slowlog.name = index.indexing.slowlog.index # logger.index_indexing_slowlog.level = trace # logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling # logger.index_indexing_slowlog.additivity = false # ################################################################################ # TEXT LOGGING DEFINITIONS END HERE ################################################################################ ```

WDYT?

albertzaharovits commented 5 years ago

I chatted with @pgomulka earlier and suggested whether it makes sense to ship the log4j2.properties files that #36833 bring, clearly marked with comments and then an additional section with the normal old-style logging, fully commented out, ready to be uncommented out.

Un-commenting the section requires a node restart. In the case of the audit log, we went with both formats enabled out-of-the-box, with two independent loggers, and we strongly advised disabling the logger for the old format by using the cluster settings API.

Just wanted to point out that the audit log is not following this convention currently: plaintext logs are being sent to {clustername}_access.log and JSON logs are being sent to {clustername}_audit.log. Starting 7.0 they plan to just emit the JSON logs IIRC.

This is correct.

Old logs still using previous pattern and .log filetype suffix. New logs on the other hand would use a new pattern and .json file type suffix. That might be concerning that there will be duplicated amount of logs (and files), but on the other side we don't log a massive amounts anyway.

We named the new audit log {clustername}_audit.log instead of {clustername}_audit.json because the file is not really a properly formatted json document (It is not an array of comma separated json docs, it is a log of JSON docs separated by endlines).

pgomulka commented 5 years ago

The json format will be a default one in the long run. However, we will have a transition period where we will keep log files in both JSON and previous formats. This will allow us to create additional tooling and gather feedback from users in regards to readability. Since keeping two sets of files will have an impact on storage we will reduce the log accumulation size from 2GB to 1GB As a followup we will schedule a removal of the previous log format.

jasontedor commented 5 years ago

Since keeping two sets of files will have an impact on storage we will reduce the log accumulation size from 2GB to 1GB

We can consider this; I am not convinced that it's a change that we need to make.

dliappis commented 5 years ago

I think it's worth mentioning on top of @pgomulka 's notes that, going forward, logs on the console for the docker image will be in json format by default.

ycombinator commented 5 years ago

If we go down the path of keeping both flavors of logs - plaintext and json - I'd like to know what globs/extensions each flavor would use. That way Filebeat can look in the right places and know how to parse what it sees.

pgomulka commented 5 years ago

@ycombinator we will produce following log files with new JSON format

${cluster_name}_server.json
${cluster_name}_deprecation.json
${cluster_name}_index_search_slowlog.json
${cluster_name}_index_indexing_slowlog.json

and following log files with an old pattern (not containing newly added IDs like cluster.uuid)

${cluster_name}.log
${cluster_name}_deprecation.log
${cluster_name}_index_search_slowlog.log
${cluster_name}_index_indexing_slowlog.log

gc.log
${cluster_name}_audit.log - I feel *.json would be ok anyway

ycombinator commented 5 years ago

This is perfect, thanks @pgomulka!

ycombinator commented 5 years ago

@pgomulka Looking closer at the log file names, I have a couple of questions:

For the server log, it appears that today we write it to the log file ${cluster_name}.log. Just want to confirm that once this issue is resolved, we will be writing it to ${cluster_name}_server.log instead? If that's the case, then I will make sure to update the current Filebeat elasticsearch module to look for either one.
What about gc and audit log files? Could you add their names to both lists in your comments so we have the complete list in one place?

Thanks!

jasontedor commented 5 years ago

I do not think we should change the name of the existing logs; that is a breaking change with little benefit to us.

pgomulka commented 5 years ago

I have renamed the ${cluster_name}.log to ${cluster_name}_server.log for consistency, but agree with @jasontedor that it won't benefit us and it would be a unnecessary breaking change. We will stick with ${cluster_name}.log for old-format logs. updated comment

I am however wondering if audit logs should not be in ${cluster_name}_audit.json instead of ${cluster_name}_audit.log. I understand @albertzaharovits concerns about this file not being a json document. I think however that this is widely used and ok to have multiple json documents in a file named *.json. Notice that we use that in our Elastic documentation as well sample tutorial from ES

GC logs stay untouched.

ruflin commented 5 years ago

There is not direct benefit for Elasticsearch but I think we should change this for Filebeat to make collection easier. Right now on the Filebeat side, we define what a server log is by excluding all the other log types:

exclude_files: [".gz$","_slowlog.log$","_access.log$","_deprecation.log$"]

If a new log type is added, this config breaks which I don't think it should. Having *_server.log as pattern would be then enough and not exclude patterns are needed.

ruflin commented 5 years ago

I just had a sync up with @pgomulka . Here the conclusions:

The *.log names will stay the same in 6.x and 7.x
The *.json logs will get the _server postfix
In 7.x by default the json and old logs will be around. So far my assumption was it is either or. This doubles the actual log size created by Elasticsearch.

@ycombinator The last point has some effects on the Filebeat module. For Filebeat 7 I think the default should be to only have the JSON logs in the path patterns. If someone wants to ingest the old logs from 7, some manual config changes for the paths are needed by the user. We need docs for this.

@pgomulka Is there a way that the Elasticsearch user can disable the "double" logging and only have JSON logs?

pgomulka commented 5 years ago

@ruflin Users are free to opt out from either logs by deleting unwanted appenders from log4j2.properties. I also mentioned about that in migration docs.

We are aware of the additional log size, but during core-infra meeting we agreed that this is not a major concern. We can reduce the log size if needed by changing the rolling parameter accumulation size from 2GB to 1GB. comment

elastic / elasticsearch

Logging changes to support Beats Elasticsearch module (including logging to JSON) #32850