[Question] Docker 1.2.0 server 内置 rocksdb 磁盘满了之后未自动清理

ycycyccc commented 4 months ago

Problem Type (问题类型)

None

Before submit

[X] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.0.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
Data Size: xx vertices, xx edges

Your Question (问题描述)

docker部署1.2.0版本，使用内置rocksdb，schema均设置了点边的ttl为86400000（单位为毫秒应该没错吧？），但在磁盘满了之后写入持续报No space left on device，即使在停止写入几天后，磁盘仍未释放，数据文件停留在几天前，gremlin console查询点边均已为空（hugegraph.traversal().V().limit(1)已查不到数据），想问下会何时清理磁盘，需要手动触发吗？ @dosu-bot

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot[bot] commented 4 months ago

嗨，@ycycyccc！我们又见面了 😄 看到您遇到了一些问题，我正在查看您的问题，很快会给您一个完整的回答。🔍🚀

imbajin commented 4 months ago

@dosu-bot

@dosu-bot 好像忘记回复了呢, 请结合上下文和历史 issue 给一个详细的回复和可能的解决方式

dosubot[bot] commented 4 months ago

Based on the context provided, it seems like you're encountering an issue where the disk space is not being cleared even after setting the TTL for vertices and edges to 86400000 and stopping write operations for several days. This could be due to the nature of how RocksDB, the backend storage engine used by Apache HugeGraph, handles deletions and compactions.

In RocksDB, data is not immediately removed from disk when a delete operation occurs or when data expires. Instead, RocksDB marks the data as deleted, and the actual space is reclaimed during a compaction process. This is why even after stopping the write operations for several days, the disk space was not released.

From the provided context, there is a registerMetaHandler method that registers a handler for "compact" operations. This suggests that Apache HugeGraph does support triggering compaction manually through its meta handler mechanism. You can manually trigger the compaction process, which is necessary for reclaiming disk space after data has been marked as deleted or has expired due to TTL settings. Compaction is the process that actually removes the deleted or expired data from disk.

Here's the relevant part of the code that demonstrates this behavior:

this.registerMetaHandler("compact", (session, meta, args) -> {
    RocksDBMetrics metrics = new RocksDBMetrics(dbsGet.get(), session);
    return metrics.compact();
});

Therefore, to address the issue of disk space not being cleared, you may need to manually trigger a compaction in Apache HugeGraph. This can be done by invoking the "compact" meta handler as shown in the code snippet. Triggering compaction should help in reclaiming the disk space that is still being occupied by data that has been logically deleted or expired.

As for the TTL (Time to Live) setting in the Apache HugeGraph(incubating) Release 1.2.0, it works by automatically triggering the deletion of expired vertices and edges. This is evident from the expiredElementIds method within the provided code snippet. This method checks the current time against the expiration time of each element (vertex or edge) and collects those that have expired into a set. It then removes these expired elements from the elementIds set, effectively deleting them from the index. This method does not require manual intervention to delete expired vertices and edges; it automatically handles the deletion based on the TTL settings and the current time.

Here's the relevant part of the code that demonstrates this behavior:

public Set<IdWithExpiredTime> expiredElementIds() {
    long now = this.graph.now();
    Set<IdWithExpiredTime> expired = InsertionOrderUtil.newSet();
    for (IdWithExpiredTime id : this.elementIds) {
        if (0L < id.expiredTime && id.expiredTime < now) {
            expired.add(id);
        }
    }
    this.elementIds.removeAll(expired);
    return expired;
}

I hope this information helps you understand why the disk space is not being cleared and how to manually trigger the clearing of disk space in the Apache HugeGraph(incubating) Release 1.2.0 deployed using Docker with RocksDB backend. If you have any further questions, feel free to ask.

For more information, you can refer to the following sources:

You can also refer to the similar issue here for more context.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

imbajin commented 4 months ago

@dosu-bot 感谢你的回复, HG 的确提供了一个手动触发 compaction 的 api, 参考 #1373 和 #1378, 你能告诉用户如何使用和调用它吗? (doc 参考 https://hugegraph.apache.org/docs/clients/restful-api/graphs/#65-compact)

另外, 虽然手动触发 compaction 能缓解这个问题, 但它并不是期望的做法, 你应该查一下:

为何 Rocksdb 没有自动触发 compaction 长达数天之久, 这似乎是不符合预期的 (是否需要调整相关阈值参数)
查询一下 RocksdbOption 中有哪些可控制 compaction 阈值和磁盘空间, 或者 WAL 日志大小相关参数
如果用户想手动从磁盘中删除 Rocksdb 存储的过期数据, 应该如何操作避免误删, 告诉一下 Rocksdb 目录的常见文件存储结构 (refer https://github.com/apache/incubator-hugegraph/discussions/2371)

dosubot[bot] commented 4 months ago

To manually trigger compaction in Apache HugeGraph, you can use the following API endpoint:

PUT http://localhost:8080/graphs/hugegraph/compact

This endpoint will trigger a compaction operation on the graph database. Please replace localhost:8080 with your actual server address and port, and hugegraph with your actual graph name.

Regarding your questions about RocksDB:

Why RocksDB is not automatically triggering compaction for several days: Automatic compaction in RocksDB is influenced by several options such as COMPACTION_STYLE, MAX_BG_JOBS, and LEVELS_COMPRESSIONS. If compaction is not occurring as expected, it could be due to the configuration of these options. For example, a high number of background jobs (MAX_BG_JOBS) with insufficient system resources could delay compaction processes.
What options are available in RocksdbOption to control compaction thresholds, disk space, or WAL log size: You can adjust COMPACTION_STYLE and NUM_LEVELS to control how and when compaction occurs. Use MAX_TOTAL_WAL_SIZE to control the total size of WAL files. Setting this to a lower value can help in managing disk space usage by triggering flushes and compactions more aggressively.
If I want to manually delete expired data stored by RocksDB from the disk, how should I proceed to avoid accidental deletion: To safely delete expired data, ensure that your compaction strategy effectively removes obsolete data. Use DELETE_OBSOLETE_FILE_PERIOD to control how frequently obsolete files are purged. Setting this option to a lower value can help in more aggressively cleaning up expired data. However, manual intervention or direct interaction with RocksDB might be necessary for specific data deletion scenarios, as the provided API options focus on general configuration and do not directly expose data deletion methods.

Please refer to the RocksDBOptions.java file for more details on these options.

Remember, adjusting these settings requires a good understanding of your application's workload and how RocksDB's internal mechanisms work. Improper configuration can lead to performance degradation or increased disk usage. Always test changes in a staging environment before applying them to production.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

github-actions[bot] commented 3 months ago

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

apache / incubator-hugegraph