Whenever Drill executes a Splunk query, it must retrieve a list of indexes from Splunk. This step can add a considerable amount of time to the planning phase. This PR introduces a simple in-memory cache for the Splunk plugin which caches the list of indexes to avoid having to query Splunk repeatedly to obtain this information.
This PR also makes a few unrelated minor improvements:
Updates the test container to Splunk version 9.3 which at the time of writing is the most current version. I had to update some unit tests as a result.
Adds a new config option for the maximum columns returned in Splunk
Adds the actual SPL sent to Splunk to the query plan. This can be useful for debugging.
Documentation
(Added to README)
For every query that you send to Splunk from Drill, Drill will have to pull schema information from Splunk. If you have a lot of indexes, this process can cause slow planning time. To improve planning time, you can configure Drill to cache the index names so that it does not need to make additional calls to Splunk.
There are two configuration parameters for the schema caching: maxCacheSize and cacheExpiration. The maxCacheSize defaults to 10k bytes and the cacheExpiration defaults to 1024 minutes. To disable schema caching simply set the cacheExpiration parameter to a value less than zero.
DRILL-8504: Add Schema Caching to Splunk Plugin
Description
Whenever Drill executes a Splunk query, it must retrieve a list of indexes from Splunk. This step can add a considerable amount of time to the planning phase. This PR introduces a simple in-memory cache for the Splunk plugin which caches the list of indexes to avoid having to query Splunk repeatedly to obtain this information.
This PR also makes a few unrelated minor improvements:
Documentation
(Added to README) For every query that you send to Splunk from Drill, Drill will have to pull schema information from Splunk. If you have a lot of indexes, this process can cause slow planning time. To improve planning time, you can configure Drill to cache the index names so that it does not need to make additional calls to Splunk.
There are two configuration parameters for the schema caching:
maxCacheSize
andcacheExpiration
. The maxCacheSize defaults to 10k bytes and thecacheExpiration
defaults to 1024 minutes. To disable schema caching simply set thecacheExpiration
parameter to a value less than zero.Testing
Ran all unit tests and tested manually.