Closed ycycyccc closed 2 months ago
嗨,@ycycyccc!我们又见面了 😄 看到您遇到了一些问题,我正在查看您的问题,很快会给您一个完整的回答。🔍🚀
@dosu-bot
@dosu-bot 好像忘记回复了呢, 请结合上下文和历史 issue 给一个详细的回复和可能的解决方式
Based on the context provided, it seems like you're encountering an issue where the disk space is not being cleared even after setting the TTL for vertices and edges to 86400000 and stopping write operations for several days. This could be due to the nature of how RocksDB, the backend storage engine used by Apache HugeGraph, handles deletions and compactions.
In RocksDB, data is not immediately removed from disk when a delete operation occurs or when data expires. Instead, RocksDB marks the data as deleted, and the actual space is reclaimed during a compaction process. This is why even after stopping the write operations for several days, the disk space was not released.
From the provided context, there is a registerMetaHandler
method that registers a handler for "compact" operations. This suggests that Apache HugeGraph does support triggering compaction manually through its meta handler mechanism. You can manually trigger the compaction process, which is necessary for reclaiming disk space after data has been marked as deleted or has expired due to TTL settings. Compaction is the process that actually removes the deleted or expired data from disk.
Here's the relevant part of the code that demonstrates this behavior:
this.registerMetaHandler("compact", (session, meta, args) -> {
RocksDBMetrics metrics = new RocksDBMetrics(dbsGet.get(), session);
return metrics.compact();
});
Therefore, to address the issue of disk space not being cleared, you may need to manually trigger a compaction in Apache HugeGraph. This can be done by invoking the "compact" meta handler as shown in the code snippet. Triggering compaction should help in reclaiming the disk space that is still being occupied by data that has been logically deleted or expired.
As for the TTL (Time to Live) setting in the Apache HugeGraph(incubating) Release 1.2.0, it works by automatically triggering the deletion of expired vertices and edges. This is evident from the expiredElementIds
method within the provided code snippet. This method checks the current time against the expiration time of each element (vertex or edge) and collects those that have expired into a set. It then removes these expired elements from the elementIds
set, effectively deleting them from the index. This method does not require manual intervention to delete expired vertices and edges; it automatically handles the deletion based on the TTL settings and the current time.
Here's the relevant part of the code that demonstrates this behavior:
public Set<IdWithExpiredTime> expiredElementIds() {
long now = this.graph.now();
Set<IdWithExpiredTime> expired = InsertionOrderUtil.newSet();
for (IdWithExpiredTime id : this.elementIds) {
if (0L < id.expiredTime && id.expiredTime < now) {
expired.add(id);
}
}
this.elementIds.removeAll(expired);
return expired;
}
I hope this information helps you understand why the disk space is not being cleared and how to manually trigger the clearing of disk space in the Apache HugeGraph(incubating) Release 1.2.0 deployed using Docker with RocksDB backend. If you have any further questions, feel free to ask.
For more information, you can refer to the following sources:
You can also refer to the similar issue here for more context.
@dosu-bot 感谢你的回复, HG 的确提供了一个手动触发 compaction 的 api, 参考 #1373 和 #1378, 你能告诉用户如何使用和调用它吗? (doc 参考 https://hugegraph.apache.org/docs/clients/restful-api/graphs/#65-compact)
另外, 虽然手动触发 compaction
能缓解这个问题, 但它并不是期望的做法, 你应该查一下:
Rocksdb
没有自动触发 compaction 长达数天之久, 这似乎是不符合预期的 (是否需要调整相关阈值参数)RocksdbOption
中有哪些可控制 compaction 阈值和磁盘空间, 或者 WAL
日志大小相关参数To manually trigger compaction in Apache HugeGraph, you can use the following API endpoint:
PUT http://localhost:8080/graphs/hugegraph/compact
This endpoint will trigger a compaction operation on the graph database. Please replace localhost:8080
with your actual server address and port, and hugegraph
with your actual graph name.
Regarding your questions about RocksDB:
Why RocksDB is not automatically triggering compaction for several days: Automatic compaction in RocksDB is influenced by several options such as COMPACTION_STYLE
, MAX_BG_JOBS
, and LEVELS_COMPRESSIONS
. If compaction is not occurring as expected, it could be due to the configuration of these options. For example, a high number of background jobs (MAX_BG_JOBS
) with insufficient system resources could delay compaction processes.
What options are available in RocksdbOption
to control compaction thresholds, disk space, or WAL log size: You can adjust COMPACTION_STYLE
and NUM_LEVELS
to control how and when compaction occurs. Use MAX_TOTAL_WAL_SIZE
to control the total size of WAL files. Setting this to a lower value can help in managing disk space usage by triggering flushes and compactions more aggressively.
If I want to manually delete expired data stored by RocksDB from the disk, how should I proceed to avoid accidental deletion: To safely delete expired data, ensure that your compaction strategy effectively removes obsolete data. Use DELETE_OBSOLETE_FILE_PERIOD
to control how frequently obsolete files are purged. Setting this option to a lower value can help in more aggressively cleaning up expired data. However, manual intervention or direct interaction with RocksDB might be necessary for specific data deletion scenarios, as the provided API options focus on general configuration and do not directly expose data deletion methods.
Please refer to the RocksDBOptions.java file for more details on these options.
Remember, adjusting these settings requires a good understanding of your application's workload and how RocksDB's internal mechanisms work. Improper configuration can lead to performance degradation or increased disk usage. Always test changes in a staging environment before applying them to production.
Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label
Problem Type (问题类型)
None
Before submit
Environment (环境信息)
Your Question (问题描述)
docker部署1.2.0版本,使用内置rocksdb,schema均设置了点边的ttl为86400000(单位为毫秒应该没错吧?),但在磁盘满了之后写入持续报No space left on device,即使在停止写入几天后,磁盘仍未释放,数据文件停留在几天前,gremlin console查询点边均已为空(hugegraph.traversal().V().limit(1)已查不到数据),想问下会何时清理磁盘,需要手动触发吗?
@dosu-bot
Vertex/Edge example (问题点 / 边数据举例)
No response
Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)
No response