akumuli / Akumuli

Time-series database
http://akumuli.org
Apache License 2.0
836 stars 84 forks source link

Db size disk space limitations #355

Closed goodspeed1986 closed 4 years ago

goodspeed1986 commented 4 years ago

I plan to use your database in embedded systems and have the limit on free disk space is about 1 GB. Can your db, upon reaching the maximum specified size, start deleting old data?

Lazin commented 4 years ago

This is a default behavior. You need to setup number of volumes and volume size in the configuration. Akumuli will start overwriting old volumes when running out of disk space.

Akumuli needs some disk space for WAL but it's configurable. You can disable it completely and you can set its size. Also, the configuration and metadata is stored in sqlite3 database on disk. You can't limit its size but it's usually quite small. Its size depend on database cardinality (number of unique time-series). Most of the time, even if the dataset is huge sqlite db is quite small (around 10MB or less).

It's also worth mentioning that Akumuli will write logs. You will need to delete the old logs periodically.

Here is the config file example:

# akumulid configuration file (generated automatically).

# path to database files.  Default values is  ~/.akumuli.
path=~/.akumuli

# Number of volumes used  to store data.  Each volume  is
# 4Gb in size by default and allocated beforehand. To change number
# of  volumes  they  should  change  `nvolumes`  value in
# configuration and restart daemon.
nvolumes=10

# Size of the individual volume. You can use MB or GB suffix.
# Default value is 4GB (if value is not set).
volume_size=90MB

# HTTP API endpoint configuration

[HTTP]
# port number
port=8181

# TCP ingestion server config (delete to disable)

[TCP]
# port number
port=8282
# worker pool size (0 means that the size of the pool will be chosen automatically)
pool_size=1

# UDP ingestion server config (delete to disable)

[UDP]
# port number
port=8383
# worker pool size
pool_size=1

# OpenTSDB telnet-style data connection enabled (remove this section to disable).

[OpenTSDB]
# port number
port=4242

# Logging configuration
# This is just a log4cxx configuration without any modifications

log4j.rootLogger=all, file
log4j.appender.file=org.apache.log4j.DailyRollingFileAppender
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} [%t] %c [%p] %m%n
log4j.appender.file.filename=/tmp/akumuli.log
log4j.appender.file.datePattern='.'yyyy-MM-dd

# Write-Ahead-Log section (delete to disable)
[WAL]
# WAL location
path=~/.akumuli

# Max volume size. Log records are added until file size
# will exced configured value.
volume_size=5MB

# Number of log volumes to keep on disk per CPU core. E.g. with `volume_size` = 256MB
# and `nvolumes` = 4 and 4 CPUs WAL will use 4GB at most (4*4*256MB).
nvolumes=5

It will create 10 data volumes 90MB each. Also, it will create at most 5 WAL volumes 5MB each which gives us 925MB total. 75MB will be available for metadata and logs.

goodspeed1986 commented 4 years ago

Thank you. It's very good news to me. I have one more question about writing data to the db. I have several remote plcs, which collect data and send to my server with unix timestamp. But if they loose connection with the server, they begin write data to local sd card and then connection is restored they send this data as historical to the server. But first of all they send current state of vars on the plc with current timestamp. Attention question! Can I write historical data with old timestams after I record data with current timestams?

Lazin commented 4 years ago

It's not possible. You can write historical data after the data with current timestamps if you're writing them in different time-series. If you're writing into the same series you have to write sequentially.