A Log4j appender implementation that will collect log events into a staging buffer up to a configured size to then publish to external stores such as:
All external stores above are optional (although to be of any use at least one should be used). If no configuration is found for S3, for instance, the appender will not attempt to publish to S3. Likewise, if there is not configuration for Apache Solr, the appender will not attempt to publish to Solr.
The packages in MVN Repo should work as long as you're on the correct Java version (see below).
Release / tag | JSDK version |
---|---|
2.x and earlier | Java SDK (JDK) 8 |
5.2.1 and earlier | Java SDK (JDK) 11 |
5.2.2+ | OpenJDK 21 |
The project is broken up into several packages:
Please substitute in the latest version in your case (so I don't have to keep updating this README.md).
<dependency>
<groupId>com.therealvan</groupId>
<artifactId>appender-log4j2</artifactId>
<version>4.0.0</version>
</dependency>
Please ignore the non-semver versions 2.0 and 0.3.0.
Please consult the log4j-s3-search-samples project for sample programs using this library for both Log4j and Log4j2.
In addition to the typical appender configuration (such as layout, Threshold, etc.), these common properties control the appender in general:
production,webserver
qa,database
DO NOT specify both stagingBufferSize and stagingBufferAge. Choose the option that works best for you. Because there is some overhead on preparing and upload of logs, if you specify too small a value for these parameters, the logger may not have enough time to do its work and eventually will cause your process to fail.
How small is too small? It really depends on how often your program logs. In general, I would suggest a minimum of 500 for stagingBufferSize and 60 seconds for stagingBufferAge.
A sample snippet from a sample log4j2.xml
to publish whenever 10 events are collected:
<Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
<PatternLayout pattern="%d{HH:mm:ss,SSS} [%t] %-5p %c{36} - %m%n"/>
<verbose>false</verbose>
<!-- Examples of optional tags to attach to entries (applicable only to SOLR & Elasticsearch)-->
<tags>TEST,ONE,TWO;THREE</tags>
<!-- Number of messages (lines of log) to buffer before publishing out -->
<stagingBufferSize>10</stagingBufferSize>
<s3Bucket>mybucket</s3Bucket>
<s3Path>logs/exampleApplication2/</s3Path>
<s3Region>us-west-2</s3Region>
...
or, if a time-based publishing policy is desired (e.g. publish every 15 minutes):
<Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
...
<!-- Number of messages (lines of log) to buffer before publishing out -->
<stagingBufferAge>15</stagingBufferAge>
...
These properties (please use your own values) control how the logs will be stored in S3:
These properties determine how to connect to S3:
Use either:
but not all three simultaneously. You will get an error from AWS if you use all three.
AWS credentials are required to interact with S3. NOTE that the recommended way of configuring
the credentials is:
1) using roles assigned to instance profiles (when working with EC2 instances) or
2) creating a credentials file on the computer running the program as
%USERPROFILE%\.aws\credentials
(Windows) or ~/.aws/credentials
(see
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/credentials.html#credentials-file-format)
If the above methods are not possible for your situation, these properties can also be overridden in the optional Log4j configuration:
When these properties are present in the configuration, they take precedence over the default sources in the credential chain as described earlier.
A sample snippet (with the optional s3AwsKey and s3AwsSecret properties set):
<Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
...
<s3Bucket>mybucket</s3Bucket>
<s3Path>logs/exampleApplication2/</s3Path>
<s3Region>us-west-2</s3Region>
<s3AwsKey>CMSADEFHASFHEUCBEOERUE</s3AwsKey>
<s3AwsSecret>ASCNEJAERKE/SDJFHESNCFSKERTFSDFJESF</s3AwsSecret>
....
The final S3 key used in the bucket follows the format:
{s3Path}/yyyyMMddHH24mmss_{hostname}_{UUID w/ "-" stripped}
e.g.
logs/myApplication/20150327081000_localhost_6187f4043f2449ccb4cbd3a7930d1130
Content configurations
.gz
suffix when s3Compression
is
enabled. (If s3Compression
is not "true," this is ignored.) These properties (please use your own values) control how the logs will be stored in Azure Blob Storage:
.gz
suffix when
azureBlobCompressionEnabled
is enabled. (If azureBlobCompressionEnabled
is not "true," this is ignored.)<Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
...
<azureBlobContainer>my-container</azureBlobContainer>
<azureBlobNamePrefix>logs/myApplication/</azureBlobNamePrefix>
<!-- optional -->
<azureBlobCompressionEnabled>false</azureBlobCompressionEnabled>
<azureStorageConnectionString>DefaultEndpointsProtocol=https;AccountName=...;EndpointSuffix=core.windows.net</azureStorageConnectionString>
Just as the case of S3, the final blob name used in the container follows the format:
{azureBlobNamePrefix}/yyyyMMddHH24mmss_{hostname}_{UUID w/ "-" stripped}
e.g.
logs/myApplication/20150327081000_localhost_6187f4043f2449ccb4cbd3a7930d1130
Notes:
AZURE_STORAGE_CONNECTION_STRING
on the hosts running your code.
However, you can also set the azureStorageConnectionString
property for local testing.See Azure Storage connection strings for more info on connection strings.
These properties (please use your own values) control how the logs will be stored in GCP Storage service:
.gz
suffix when
gcpStorageCompressionEnabled
is enabled. (If gcpStorageCompressionEnabled
is not "true," this is ignored.)Just as in the case with AWS S3, there is an extensive authentication process and list of options. This tool will assume the running process has the necessary authentication setup done.
While working on this, for example, I downloaded my service account's JSON key file and set the environment
variable GOOGLE_APPLICATION_CREDENTIALS
to the full path to the file. This allowed my programs using the
Store API
to work without doing any specific authentication calls.
A sample snippet from log4j2.xml
:
<Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
...
<gcpStorageBucket>my-bucket</gcpStorageBucket>
<gcpStorageBlobNamePrefix>logs/myApplication/</gcpStorageBlobNamePrefix>
<!-- optional -->
<gcpStorageCompressionEnabled>false</gcpStorageCompressionEnabled>
Just as the other cases, the final blob name used in the bucket follows the format:
{gcpStorageBlobNamePrefix}/yyyyMMddHH24mmss_{hostname}_{UUID w/ "-" stripped}
e.g.
logs/myApplication/20150327081000_localhost_6187f4043f2449ccb4cbd3a7930d1130
Normally, static values are used for path/prefix for the cloud storage destination. An example is a file-path-like string:
logs/messages/myapp/
This will cause published logs to look like:
logs/message/myall/....
However, there is a limited support for template expansion (currently only the datetime). So it is possible to specify a path like:
logs/messages/%d{yyyy_MM_dd_HH_mm_ss}/myapp
The above will tell the cloud storage publishers to dynamically adjust the path/prefix
for the destination of the blobs published using the same syntax used for PatternLayout
.
An uploaded blob with the configuration above may look like:
logs/messages/2020_08_23_22_04_34/myapp/....
Note that, in the above example, the time at which the publish was done (e.g. 2020-08-23 10:04:34 PM) was dynamically injected into the path according to the pattern specified. As more logs are published, each publish will have a different path/prefix because each of these publishes will be done at different times.
There is only one property for Solr: the REST endpoint to the core/collection:
A sample snippet:
<Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
...
<solrUrl>http://localhost:8983/solr/log-events/</solrUrl>
There are four properties for Elasticsearch, all but one are optional:
elasticsearchHosts -- comma-delimited list of host:port
values. There is no default;
this property is required.
The scheme/protocol is http://
by default, but you can override this by
explicitly including it in the value (e.g. https://localhost:9200
).
IElasticsearchPublishHelper
that will perform publishing to Elasticsearch <Configuration status="INFO">
<Appenders>
<Log4j2Appender name="Log4j2Appender">
...
<elasticsearchCluster>elasticsearch</elasticsearchCluster>
<elasticsearchIndex>logindex</elasticsearchIndex>
<elasticsearchType>log</elasticsearchType>
<elasticsearchHosts>elasticsearchHosts=localhost:9300</elasticsearchHosts>
A new core should be created for the log events. The setting up of Apache Solr and the setting up of a core are
outside the scope of this file. However, a sample template for a schema.xml
that can be used is included in this
repo as /misc/solr/schema.xml
.
Each log event will be indexed as a Solr document. The "id" property for each document will follow the format:
yyyyMMddHH24mmss_{host name}_{UUID w/ "-" stripped}-{host name}-{sequence}
e.g.
20150327081000_mycomputer_6187f4043f2449ccb4cbd3a7930d1130-mycomputer-0000000000000012
NOTE that this ID is formatted such that one can cross-reference a document to the S3 batch from which the corresponding log event can be found.
String id = solrDoc.getFieldValue("id").toString();
String s3Key = id.substring(0, id.indexOf("-"));
A new index should be created for the log events. The setting up of Elasticsearch and the index are outside the scope
of this file. However, a sample template for the index schema that can be used is included in this repo as
/misc/elasticsearch/logindex.json
.
This schema should be installed before any log entries are added. A typical PUT to /<elasticsearch host>:9200/<index>
with the body of the JSON should be sufficient.
Each log event will be indexed as a document. The "id" property for each document will follow the format:
yyyyMMddHH24mmss_{host name}_{UUID w/ "-" stripped}-{host name}-{sequence}
e.g.
20150327081000_mycomputer_6187f4043f2449ccb4cbd3a7930d1130-mycomputer-0000000000000012
NOTE that this ID is formatted such that one can cross-reference a document to the S3 batch from which the corresponding log event can be found.
String id = solrDoc.getFieldValue("id").toString();
String s3Key = id.substring(0, id.indexOf("-"));
The appender and components of this library also logs events under the logger named "com.van.logging.VansLogger
."
To prevent logs of this logger from polluting the clients' logs, these logs will be ignored by the code
(LoggingEventCache
) when forwarding to various log publishers.
To see these logs (e.g. to debug), you can add a logger config to dump to console (or any other appender):
<Appenders>
<Console name="ConsoleAppender" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss,SSS} [%t] %-5p %c{36} - %m%n"/>
</Console>
...
</Appenders>
<Loggers>
<Logger name="com.van.logging" level="debug" additivity="false">
<AppenderRef ref="ConsoleAppender" />
</Logger>
...
</Loggers>
....
An example of this can be seen in the example repo log4j-s3-search-samples.