Intelligence - Githubissues

fabriziosalmi / blacklists

Hourly updated domains blacklist 🚫

GNU General Public License v3.0

131 stars 6 forks source link

Using a time series database (TSDB) to track domain blacklisting over time can be a good idea. TSDBs are optimized for handling time-stamped data. However, the use case you mentioned might be better handled by a combination of TSDB and a relational database or a specialized solution like an ELK (Elasticsearch, Logstash, Kibana) stack.

Here's a general approach:

Choosing a Time Series Database:
- InfluxDB is a popular open-source time series database.
- TimescaleDB is another option, which is built on PostgreSQL. This gives you the advantage of a relational database combined with the capabilities of a TSDB.
Storing Data:
- When you receive a new blacklist file, timestamp the data and insert it into your chosen TSDB.
- For each domain, record its FQDN and associated IP address(es), and the timestamp it was blacklisted.
Analysis:
- To find domains or IP addresses that are repeatedly blacklisted, you can run periodic queries on the TSDB.
- If using TimescaleDB, for instance, SQL queries can help identify repeating patterns.
Using ELK Stack:
- Elasticsearch can index and search large amounts of log or event data.
- Logstash can be used to ingest and process the blacklist files, transforming and loading the data into Elasticsearch.
- Kibana can then visualize this data.
For your use case, every time you receive a new blacklist file:
- Use Logstash to process and send the data to Elasticsearch.
- In Elasticsearch, each document will have the domain, associated IP, and timestamp.
- Use Kibana to create visualizations and dashboards to identify trends, such as which IPs are frequently blacklisted.
Actionable Insights:
- Set up alerts or triggers. For instance, if an IP appears on the blacklist more than a certain number of times in a specified period, you can be alerted.
Maintenance:
- Regularly back up your database.
- Periodically prune old data if you do not need to retain all historical data indefinitely.

In conclusion, while a TSDB is an excellent tool for tracking time-series data, the nature of your use case suggests that you might benefit more from a combination of databases or using tools like the ELK stack. This way, you'll have both the time-based tracking and the relational analysis capabilities you're seeking.

Got it. If you only have FQDNs (fully qualified domain names) and not IP addresses, then the approach becomes slightly simpler. Still, the underlying concept remains mostly the same.

Here's a revised approach:

Choosing a Time Series Database:
- Use either InfluxDB or TimescaleDB (the advantage of TimescaleDB is that you can also do relational operations).
Storing Data:
- When you receive a new blacklist file, timestamp each FQDN and insert it into your chosen TSDB.
- For each domain, record its FQDN and the timestamp it was blacklisted.
Analysis:
- To find domains that are repeatedly blacklisted, you can run periodic queries on the TSDB.
- Example query (conceptual, assuming TimescaleDB):
```
SELECT fqdn, COUNT(*) 
FROM blacklist 
WHERE time BETWEEN now() - interval '1 month' AND now() 
GROUP BY fqdn 
HAVING COUNT(*) > 1;
```
  This query would give you domains blacklisted more than once in the past month.
Using ELK Stack:
- Elasticsearch will index the blacklist data.
- Logstash can ingest the blacklist files and send them to Elasticsearch.
- Kibana for visual analysis. You can see trends, like which FQDNs are blacklisted repeatedly over a period.
Actionable Insights:
- Set up alerts. For example, if a domain appears on the blacklist more than a certain number of times, you'll get notified.
Maintenance:
- Regular backups and, if necessary, pruning of old data.

Given you only have FQDNs, you can opt for either the TSDB approach or the ELK stack. Both have their merits:

TSDB is great if you're focusing on time-based patterns. It's optimized for storing and querying time series data.
ELK Stack offers powerful search and visualization capabilities. If you anticipate needing more advanced data analysis, visualizations, or the ability to integrate with other logs or data sources, ELK might be a better fit.

Both approaches allow you to identify domains that frequently appear on the blacklist, which seems to be a primary concern.

fabriziosalmi / blacklists

Intelligence #14