RocksDB磁盘占用问题

apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)

Apache License 2.0

2.65k stars 518 forks source link

Problem Type (问题类型)

configs (配置项 / 文档相关)

Before submit

[X] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.0.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
Data Size: xx vertices, xx edges

Your Question (问题描述)

When I use RocksDB as the data storage for HugeGraph, I observe the following phenomenon: every time I create a knowledge graph, I dynamically use the graph name as the storage directory for the database. When the graph has no nodes or edges, the disk usage is nearly 400MB. Currently, my business scenario is similar to a consumer-facing application, and disk usage will be a bottleneck. Could you please advise on how to optimize the RocksDB configuration to reduce the metadata footprint of the graph and save disk space? Thank you. I have configured rocksdb.compression_per_level to SNAPPY, but the effect is not significant.

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

I found a similar issue that was previously discussed and closed. The issue was about RocksDB's memory occupation being out of control. The suggested solution included replacing the default ptmalloc with jemalloc to manage memory usage more effectively. This involves downloading and compiling jemalloc, then modifying the server's startup script to preload the jemalloc shared object file ^[1].

For optimizing RocksDB configuration to reduce the metadata footprint and save disk space, consider the following adjustments:

Compression: Use a more aggressive compression algorithm like zstd for both the general and bottommost levels to reduce data size on disk.
Manifest File Size: Reduce the maximum size of the manifest file to limit its disk usage.
Memtable Size: Adjust the memtable size to control memory usage and potentially reduce disk writes.
Number of Levels: Reduce the number of levels if the data size is small, which can help in reducing metadata.
Direct Reads and Writes: Enable direct reads and writes to potentially reduce overhead.
Optimize Compaction: Use optimized compaction settings to manage space more efficiently ^[2].

These changes should help in reducing the metadata footprint and saving disk space when using RocksDB with HugeGraph. Adjust these settings based on your specific use case and test the impact on performance and disk usage.

To continue talking to Dosu, mention @dosu.

apache / incubator-hugegraph