data2health / website

Issues related to the CD2H website (https://ctsa.ncats.nih.gov/cd2h/home)
0 stars 0 forks source link

select and install Apache log analyzer #90

Open eichmann opened 4 years ago

eichmann commented 4 years ago

We have multiple shortlinks - Apache aliases that redirect to various servers for onboarding, etc. Julie's looking to be able to report hits on the shortlinks.

amikaili commented 4 years ago

Would it be far fetched to consider Splunk as an over-arching enterprise-wide solution. Harvesting logs is a primary function of Splunk. It has also other capabilities, e.g. Metrics Analytics, Platform Health Monitoring. It certainly boosts site maintenance capabilities and robustness? https://www.splunk.com/en_us/download/splunk-enterprise.html?utm_campaign=google_amer_en_search_brand&utm_source=google&utm_medium=cpc&utm_content=Splunk_Enterprise_Demo&utm_term=splunk&_bk=splunk&_bt=386199173970&_bm=e&_bn=g&_bg=43997962767&device=c&gclid=CjwKCAiAhc7yBRAdEiwAplGxX0GpyuYs6K-NuQLMV3w3LT6upTC48LKX4k8nXFlFy_kzb0RlURUaYxoCH48QAvD_BwE

eichmann commented 4 years ago

I've no particular inclinations in this regard. We do already have an instance of Jenkins running - primarily for container health monitoring.

amikaili commented 4 years ago
  1. Installed 60 day trial Splunk Enterprise 8
  2. Use my MAMP (the localhost server for my Drupal sandbox) log files as the datasource for Splunk
  3. Started analyzing log files (see screen capture)
  4. TODO: A remote datasource to test Splunk Universal Forwarder
Screen Shot 2020-02-25 at 3 10 34 PM
amikaili commented 4 years ago

https://docs.splunk.com/Documentation/Splunk/7.2.4/Overview/AboutSplunkEnterprise

The index: Transforms the machine generated data (e.g. log files) into a searchable Index of Events that end up getting analyzed for customizable fields.

The Search: Powerful concise query language. Can use Reg Ex to query things like Error conditions, or normal operational events like users logging into the server https://docs.splunk.com/Splexicon:SPL https://docs.splunk.com/Documentation/Splunk/8.0.2/Search/Aboutthesearchlanguage

amikaili commented 4 years ago

Splunk for Analytics: "Splunk can read this unstructured, semi-structured or rarely structured data. After reading the data, it allows to search, tag, create reports and dashboards on these data. With the advent of big data, Splunk is now able to ingest big data from various sources, which may or may not be machine data and run analytics on big data. So, from a simple tool for log analysis, Splunk has come a long way to become a general analytical tool for unstructured machine data and various forms of big data." from: https://www.tutorialspoint.com/splunk/splunk_overview.htm

amikaili commented 4 years ago

Splunk ### Distributed Deployment -- Topology: "There are three main types of processing components: Forwarders Indexers Search heads Forwarders ingest data. There are a few types of forwarders, but the universal forwarder is the right choice for most purposes... simply inputs data, performs minimal processing on the data, and then forwards the data to an indexer. Because its resource needs are minimal, you can co-locate it on the machines that produce the data, such as web servers."

Therefore, if we elect to choose Splunk, we would need to install a Forwarder instance of it on the production web server (Guardian). This would be the instance that would forward the raw log data to the Search/Index Head/Manager instance of Splunk (singleton)

"Indexers and search heads are built from Splunk Enterprise instances that you configure to perform the specialized function of indexing or search management, respectively. Each indexer and search head is a separate instance that usually resides on its own machine" (e.g. on my machine, and your machine... and on any NIH stake holder's machine (e.g. Julie who originally requested this capability))" This topology would be able to handle REMOTE indexing/searching, on nodes that are remotely located from the data forwarders, and connected via SSL...

from: https://docs.splunk.com/Documentation/Splunk/7.3.0/Deploy/Distributedoverview

amikaili commented 4 years ago

So as a first iteration (simple local deployment), after installing Splunk 8, on Guardian, using the first link I commented above, go ahead and follow the below ten minute tutorial, to end up being able to (using the UI this Splunk Search instance installation exposes) Splunk the production log files; except instead of the tutorial datasource they say to download and point to it, instead point it to the real live datasource, i.e. the Tomcat logfile you are wanting to analyze:

https://docs.splunk.com/Documentation/Splunk/latest/SearchTutorial/GetthetutorialdataintoSplunk?utm_medium=email&utm_source=nurture&utm_campaign=GLOBAL_Enterprise_Trial_Data_In_Other_Mar19&utm_content=Search_Tutorial

Version-wise, I don't think we would need the Splunk 8 Enterprise version for Guardian (good thing because it is not available for mac's)! The standard Free version 8 should give us all we need (GUI, Indexer, Network/System Monitor... Admin Console with analytics and visualization... Search window); plus it's FREE!!!

Please try this @eichmann or @alexisgraves .... and let me know if SUCCESS (sorry I don't have login to Guardian)?

After this is done, I am thinking the Reporting capabilities of this single instance of Splunk may suffice, for the requirement for the stake holders (Julie the original requester, and others) being able to request and receive reports (e.g. of hits, logins, alerts...). This will still provide a helpful, thorough, but effortless tool for analyzing the large volume of logfiles that have accumulated on Guardian, for error conditions, performance issues, metrics? Splunk dashboards also would provide an invaluable one stop checking of the health and performance of the website, and enabling diagnostics.

amikaili commented 4 years ago

Phase II - Configuring mutually remote forwarder/receiver (after done installing standalone Splunk on Guardian and getting it to work) Start here to download the Forwarder onto Guardian: How to forward data to Splunk Enterprise:

https://docs.splunk.com/Documentation/Forwarder/8.0.2/Forwarder/HowtoforwarddatatoSplunkEnterprise

Step One - Configure Splunk to receive: https://docs.splunk.com/Documentation/Forwarder/8.0.2/Forwarder/Enableareceiver Using the CLI the command would be: % splunk enable listen -auth :

port 9997 available (% lsof -i -P -n | grep 9997) ? My laptop will receive data on its port 9997

—————————————————————————————— Start Receiver on port 9997 — Install a forwarder on Guardian———————————————————————————————————————————————————————————————————————————

https://docs.splunk.com/Documentation/Forwarder/8.0.2/Forwarder/HowtoforwarddatatoSplunkEnterprise

Step One - Configure Splunk to receive: https://docs.splunk.com/Documentation/Forwarder/8.0.2/Forwarder/Enableareceiver Using the CLI the command would be: % /Applications/Splunk/bin/splunk enable listen -auth :

port 9997 available (% lsof -i -P -n | grep 9997) ? My laptop will receive data on its port 9997

The splunk enable listen command creates a [splunktcp] stanza in inputs.conf. For example, if you set the port to 9997, it creates the stanza [splunktcp://9997].

Alternatively: Set up receiving with configuration files You enable receiving on your Splunk instance by configuring inputs.conf in $SPLUNK_HOME/etc/system/local.

To enable receiving, add a [splunktcp] stanza that specifies the receiving port. In this example, the receiving port is 9997: [splunktcp://9997] disabled = 0 Restart Splunk software for the changes to take effect Step two - The Forwarder: Download and install the universal forwarder Install a nix universal forwarder for installation on *nix operating systems. https://docs.splunk.com/Documentation/Forwarder/8.0.2/Forwarder/Installanixuniversalforwarder

The universal forwarder installation packages are available for download from splunk.com.

Install the universal forwarder on Mac OS X The universal forwarder is available for Mac OS X as a tar file or a DMG package.

Install the universal forwarder from the Finder Double-click on the DMG file. A Finder window that contains splunkforwarder.pkg opens. In the Finder window, double-click on splunkforwarder.pkg. The installer opens and displays the Introduction, which lists version and copyright information. Choose a location to install the universal forwarder. To install in the default directory, click on the harddrive icon. To select a different location, click Choose Folder.. Install the universal forwarder from a Terminal window To install the universal forwarder on Mac OS X from the command line, you must use the root user, or elevate privileges using the sudo command. If you use sudo, your account must be an Admin-level account.

Open a Terminal window. Mount the DMG:sudo hdid splunk_package_name.dmg The Finder mounts the disk image onto the desktop. The image is available under /Volumes/SplunkForwarder (note the space). Run the installer:cd /Volumes/SplunkForwarder\ sudo installer -pkg .payload/splunk.pkg -target Note: There is a space in the disk image name. Use a backslash to escape the space or wrap the disk image name in quotes. -target specifies a target volume, such as another disk, where the forwarder will be installed in /Applications/splunk

Start the universal forwarder Before the universal forwarder can accept configurations and forward data, it must be started. See Start the universal forwarder to learn how to start the universal forwarder, whether it is the first time or after you have made configuration changes.

https://docs.splunk.com/Documentation/Forwarder/8.0.2/Forwarder/Starttheuniversalforwarder

Configure the universal forwarder to send data to the Splunk Enterprise indexer Before the universal forwarder can send data to Splunk Enterprise, you must configure it with the Splunk Command Line Interface (CLI).

This procedure details a basic configuration. For additional configuration options, see Configure the universal forwarder. ./splunk add forward-server : |   Create an inputs.conf file in this directory. Edit the file by adding stanzas to inputs.conf

To monitor Apache log files: [monitor:///apache/*.log] disabled = 0