[Question] Why we use relational/SQL based backend storage in GraphDB?

Github-Polaris commented 7 months ago

Problem Type (问题类型)

None

Before submit

[X] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.0.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
Data Size: xx vertices, xx edges

Your Question (问题描述)

hugegraph的backend采用了mysql，rocksdb，cassandra、hbase等数据库，连最底层的数据存储都没有脱离关系型数据库，为什么可以叫Graph DB？我理解，所谓的Graph DB相当于在关系型数据库上做了一层图形化界面，把关系型数据库中的数据展示为图，底层数据的存储结构与形式仍未脱离关系型数据库，号称的检索速度等优势是如何体现的？

The backend of hugegraph uses mysql, rocksdb, cassandra, hbase and other databases, and even the lowest data storage is not separated from the relational database, why can it be called Graph DB? DB is equivalent to making a graphical interface on the relational database, displaying the data in the relational database as a graph, and the storage structure and form of the underlying data are still not separated from the relational database.

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot[bot] commented 7 months ago

The core of HugeGraph's designation as a graph database lies not in its storage mechanism but in how it models, stores, and queries data. HugeGraph uses a graph data model, representing data as vertices (nodes) and edges (relationships), which fundamentally differs from the tabular model of relational databases. This model facilitates direct and efficient querying of relationships and patterns within the data, leveraging graph-specific operations and algorithms.

HugeGraph supports various backends like MySQL, RocksDB, Cassandra, and HBase, not as a limitation to relational models but as a flexible foundation for storing graph data. On top of these storage systems, HugeGraph implements graph-specific features such as schema management, indexing strategies for efficient query processing, and graph traversal optimizations. These features enable HugeGraph to provide the advantages of graph databases, such as improved search speeds and efficient data relationship exploration, regardless of the underlying storage system.

The use of traditional databases for storage allows HugeGraph to leverage their robustness, scalability, and persistence capabilities while abstracting their complexities away from the user. This abstraction is achieved through a unified graph processing interface that translates graph operations into database-specific queries and updates, ensuring that the graph model's benefits are fully realized.

In essence, HugeGraph's ability to serve as a true graph database is derived from its implementation of graph-specific operations, optimizations, and its support for graph query languages, rather than the nature of its storage backend. This approach allows HugeGraph to offer the unique advantages of graph databases, including efficient handling of complex queries and relationships, which are central to the graph data model.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

imbajin commented 7 months ago

Simply put, in addition to the content mentioned by @dosu-bot, HugeGraph's main storage design relies on the binary KV structure in disk & graph structure in memory, but is compatible with relational databases such as MySQL, making it easier for ordinary users to understand/migrate to the graph. (It should be noted that HG does not rely on relational databases, which are not equivalent to KV storage)

From an abstract perspective, the essence of a storage system is to provide the semantics of KV/Map, just like the data structure of a graph is essentially composed of array or linked-list. Therefore, building a graph system based on stable/reliable KV storage is a mainstream/common approach, and it is also easier to expand to distributed environments

dosubot[bot] commented 7 months ago

The discussion around HugeGraph's use of relational/SQL-based backend storage systems like MySQL, RocksDB, Cassandra, and HBase for graph data storage raises important points about the nature and capabilities of graph databases. The essence of HugeGraph being classified as a graph database is not strictly about the storage mechanism it employs but rather about how it models, manages, and queries data. By representing data as vertices and edges, HugeGraph adopts a graph data model that is fundamentally different from the tabular model used in relational databases. This model is optimized for efficiently querying relationships and patterns within the data, which is a core advantage of graph databases.

HugeGraph's support for various backend systems is a strategic choice that allows it to leverage the robustness, scalability, and persistence features of these established databases. On top of these systems, HugeGraph implements graph-specific functionalities such as schema management, indexing for efficient query processing, and optimizations for graph traversal. These capabilities enable HugeGraph to offer the key benefits associated with graph databases, such as enhanced search speeds and more effective data relationship exploration, irrespective of the underlying storage system.

The clarification provided by the maintainer, @imbajin, further enriches our understanding by highlighting that HugeGraph's design is centered around a binary key-value (KV) structure on disk and a graph structure in memory. This approach is not solely dependent on relational databases but is compatible with them, facilitating easier migration or understanding for users familiar with relational models. It's important to note that the use of relational databases is not equivalent to KV storage but rather a compatible option.

From a broader perspective, the choice to build a graph system on top of stable and reliable KV storage is a common and mainstream approach in the industry. This strategy not only ensures the reliability and efficiency of the storage system but also simplifies expansion into distributed environments. The underlying KV/Map semantics provided by the storage system serve as a foundational layer upon which the graph data model is implemented, allowing HugeGraph to fully leverage the unique advantages of graph databases for handling complex queries and relationships.

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

Github-Polaris commented 7 months ago

Simply put, in addition to the content mentioned by @dosu-bot, HugeGraph's main storage design relies on the binary KV structure in disk & graph structure in memory, but is compatible with relational databases such as MySQL, making it easier for ordinary users to understand/migrate to the graph. (It should be noted that HG does not rely on relational databases, which are not equivalent to KV storage)

From an abstract perspective, the essence of a storage system is to provide the semantics of KV/Map, just like the data structure of a graph is essentially composed of array or linked-list. Therefore, building a graph system based on stable/reliable KV storage is a mainstream/common approach, and it is also easier to expand to distributed environments

Thank you! The "making it easier for ordinary users to understand/migrate to the graph. (It should be noted that HG does not rely on relational databases" which is part of you sentence answered my question.

dosubot[bot] commented 7 months ago

I'm glad the explanation provided clarity on HugeGraph's approach to storage and its classification as a graph database. Understanding that HugeGraph's design is not strictly tied to relational databases but rather utilizes a binary KV structure for disk storage and a graph structure in memory helps highlight its flexibility and efficiency in handling graph data. This design choice facilitates easier migration for users familiar with relational databases while fully leveraging the advantages of graph databases for complex queries and relationship exploration. If you have any more questions or need further clarification on HugeGraph or graph databases in general, feel free to ask!

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

apache / incubator-hugegraph