Databricks Splunk Integration

The Splunk Integration project is a non-supported bidirectional connector consisting of three main components as depicted in the architecture diagram:

The Databricks add-on for Splunk, an app, that allows Splunk Enterprise and Splunk Cloud users to run queries and execute actions, such as running notebooks and jobs, in Databricks
Splunk SQL database extension (Splunk DB Connect) configuration for Databricks connectivity
Notebooks for Push and Pull events and alerts from Splunk Databricks.

We also provided extensive documentation for Log Collection to ingest, store, and process logs on economical and performant Delta lake.

Features

Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI (Fig 1 )
Execute actions in Databricks, such as notebook runs and jobs, from Splunk (Fig 2 & Fig 3)
Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports (Fig 4 & Fig 5)
Push events, summary, alerts to Splunk from Databricks (Fig 6 and Fig 7)
Pull events, alerts data from Splunk into Databricks (Fig 8)

Fig 1:

Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI

Fig 2:

Execute actions in Databricks, such as notebook runs and jobs, from Splunk

Fig 3:

Fig 4:

Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports

Fig 5:

Fig 6:

Push events, summary, alerts to Splunk from Databricks

Fig 7:

Fig 8:

Pull events, alerts data from Splunk into Databricks

Architecture

Documentation

Databricks Add-on for Splunk Integration Installation And Usage Guide:
- Documentation: [markdown, pdf, word]
- Link to Databricks add-on for Splunk on Splunkbase
Splunk DB Connect Guide for Databricks:
- Documentation: [markdown, pdf, word]
Push Data to Splunk from Databricks.docx:
- Documentation: [markdown, pdf, word]
- Notebook - push_to_splunk: source
Pull Data from Splunk into Databricks.docx:
- Documentation: [markdown, pdf, word]
- Notebook - pull_from_splunk: source

Compatibility

Databricks Add-on for Splunk, notebooks and documentation provided in this project are compatible with:

Splunk Enterprise version: 8.1.x and 8.2.x
Databricks REST API: 1.2 and 2.0:
- Azure Databricks
- AWS SaaS, E2 and PVC deployments
- GCP
OS: Platform independent
Browser: Safari, Chrome and Firefox

Log ingestion

This project also provides documentation and notebooks to showcase specifics on how to use Databricks for collecting various logs (a comprehensive list is provided below) via stream ingest and batch-ingest using Databricks autoloader and Spark streaming into cloud Data lakes for durable storage on S3. The included documentation and notebooks also provide methods and code details for each log type: parsing, schematizing, ETL/Aggregation, and storing in Delta format to make them available for analytics.

Data collection sources with notebooks and documentation are included for the following sources:

Cloudtrail logs:
- Documentation: [markdown, pdf, word]
- Notebook 1 - cloudtrail_ingest: source
- Notebook 2 - cloudtrail_insights_ingest: source
VPC flow logs:
- Documentation: [markdown, pdf, word]
- Notebook - vpc_flowlogs_ingest: source
Syslog:
- Documentation: [markdown, pdf, word]
- Notebook 1 - syslog_rfc3164: source
- Notebook 2 - syslog_rfc5424: source

Feedback

Issues with the application? Found a bug? Have a great idea for an addition? Feel free to file an issue or submit a pull request.

Legal Information

This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, help, and feature requests can be communicated via email -> cybersecurity@databricks.com or through the Issues page of this repo.

databrickslabs / splunk-integration

readme

Databricks Splunk Integration

Features

Pull events, alerts data from Splunk into Databricks (Fig 8)

Fig 1:

Fig 2:

Fig 3:

Fig 4:

Fig 5:

Fig 6:

Fig 7:

Fig 8:

Pull events, alerts data from Splunk into Databricks