Installing app on indexers, as instructed will result in multiple acceleration searches/excessive storage use

automine commented 6 years ago

According to this:

https://splunk.paloaltonetworks.com/installation.html

It's recommended to install both the Palo Alto Networks App and Add-on on all Search Heads, Indexers, and Heavy Forwarders.

Because the data model accelerations are on by default, if you install the App on all of your indexers, you will get accelerations (and their resulting searches and space usage) for not just the Search Head where you will use the app, but also for each individual indexer. So, if I have 1 search head and 5 indexers, I end up with 6 sets of accelerations, 5 of which will never be used (the ones created by the indexers).

The app needs to either: a) Have accelerations disabled by default or b) Not require installation on the indexers.

btorresgil commented 6 years ago

Hi @automine. Splunk has trained us differently on how the summary index works. This app has been built under their close guidance and they have been very clear with us about how accelerated data is handled. The App defines the datamodel on the searchhead, however, no summary index is placed on the search head. The datamodel definition is passed to the indexers. The summary indexes (aka acceleration sets) exist only on the indexers. Each indexer contains the summary index only for the data on that indexer. Therefore, it is impossible to have duplicate summarized (accelerated) data.

If you have evidence that this is not how it works and duplicate summarized data exists, then I would like to investigate this as a collaboration with you and our counterparts at Splunk. Please let me know, and feel free to email us if you prefer to work over email. splunkapp@paloaltonetworks.com

Thanks! -Brian

automine commented 6 years ago

@btorresgil, I believe you are interpreting this incorrectly. Yes, no summary is created on the search head. The search head does pass the summary searches to the indexers so the indexers can create the summaries. However, these are not summary indexes, they are instead using the high-performance analytics store which is stored next to the index from which the original events (or elsewhere if you have configured as such), however, it is not a summary index. In terms of the number of accelerations created, this is based on the GUID of the search head which has a configuration for accelerations (in the case of a Search Head Cluster, it is the GUID of the cluster). This is why when you put a datamodels.conf on the search head to accelerate the DMs, it will create accelerations on the indexers, using the GUID of the search head in a path that looks like:

$SPLUNK_DB/<index_name>/datamodel_summary/<bucket_id>/<search_head_or_pool_id>/DM_<datamodel_app>_<datamodel_name>

The problem is, when you put a datamodels.conf on the indexers as well, the indexer starts to think of itself as a search head (after all, it's just one download, and an individual instance of Splunk can act in a number of roles), and as such will create a new set of accelerations for itself (using a different GUID). As such, this means duplicate acceleration summary searches, as well as additional summary storage in the index store.

I will see what I can do about recreating this, as we have already corrected it at the customer.

Looking at the configurations, can you tell me a compelling reason that this app (not the add-on, the app) should be installed on the indexers? The index-time operations in the props.conf appear to be around the pan:wildfire_report and pan:newapps sourcetypes, which I believe are being collected by a heavy forwarder or search head, in which case the data has already been cooked.

automine commented 6 years ago

I set up a small test environment using docker with one search head and one indexer. They search head has been peered to the indexer, and is configured to forward events (no local indexing). I then installed the following on the search head

SA-Eventgen
eventgen configuration from your docker image (found in issue #28)
Palo Alto Networks Add-on (6.0.2)
Palo Alto Networks (6.0.1)

On the indexer I installed the following:

Palo Alto Networks Add-on (6.0.2)
Palo Alto Networks (6.0.1)

After waiting a little, I checked the results.

GUIDs from the two instances:

Search Head: searchhead_guid

Indexer indexer_guid

On the indexer I then checked the datamodel_summary path for the default index: indexer_datamodels

One of these buckets has some data in it: indexer_bucket

Inside of this directory, there are two directories: twoaccelerations

Notice that the first directory (C2730473-769E-4E86-AE12-E28BCBA6D108) has the same GUID as the indexer (outlined in red in the picture). The second directory has the GUID of the search head (A0BE2F65-8739-4B7A-BBC6-F3B52FE2A9A9), highlighted in yellow. Also note that both are the same size.

This means that if a user follows the instructions provided, and installs both the app and add-on on their indexers, they will be using double the space that they should on each indexer.

xoff00 commented 6 years ago

I agree with @automine's entire comment thread, especially this:

The app needs to either: a) Have accelerations disabled by default or b) Not require installation on the indexers.

For the record, we do both (disable the accelerations and don't install the app on indexers) -- but of course we also don't let the indexers do the log splitting REGEX.

btorresgil commented 6 years ago

Hey guys. @automine, first I want to thank you for the thourough explanation, for taking the time to build a reproducible example, and for your continued contributions toward making the Palo Alto Networks App and Addon better for everyone.

I investigated this with Splunk and completely agree with your assessment. I’ll update the documentation to provide guidance to not install the App on the indexers. Will update you here when the revised doc is published.

Thank you again.

automine commented 6 years ago

Just a note that you may want to provide some remediation details as well.

btorresgil commented 6 years ago

I updated the documentation here: https://splunk.paloaltonetworks.com/installation.html#where-to-install

Interested in your feedback.

Working on getting the exact paths to the files that can be deleted (ie. how to get the GUID for the datamodel acceleration that can be deleted). Let me know if you have these details, otherwise I'll research and add them.

automine commented 6 years ago

As long as the data models are removed from the indexers, Splunk should take care of reaping the acceleration data on it's own. In my testing, after removing the app and restarting, it took Splunk about 40 minutes before it removed the summary data. The directories will remain, but should get emptied.

btorresgil commented 6 years ago

ok, thanks for the confirmation, I was hoping Splunk would handle the data removal itself. So it sounds like the documentation change is enough to cover this. I'll also go back and update any Splunk Answers questions where the previous guidance to install App on Indexer was given.

Anything else you can think of for this ticket or should we close it out?

automine commented 6 years ago

I think that would satisfy my concerns.

btorresgil commented 6 years ago

Thanks for your help and input. I'll close this but leave it available for further comment in case anything comes up.

PaloAltoNetworks / Splunk-Apps

Installing app on indexers, as instructed will result in multiple acceleration searches/excessive storage use #69