apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
https://hugegraph.apache.org
Apache License 2.0
2.63k stars 517 forks source link

使用mysql作为图库数据存储时,每个图库都会占用连接数,且不释放的问题 #2668

Open fantengfeiNana opened 1 week ago

fantengfeiNana commented 1 week ago

Problem Type (问题类型)

struct / logic (架构 / 逻辑设计问题)

Before submit

Environment (环境信息)

Your Question (问题描述)

我们在使用mysql作为hugegraph的存储方式时,在批量创建知识图谱时出现了程序提示超出了数据库最大链接数的问题,经过验证发现,知识图谱在创建完成后,且没有访问的情况下,每个知识图谱对应的数据库都会占用最低两个数据库链接。【重启 hugegrapg-server】后,链接即被占用。 image image 按照这个数据推送,如果数据库最大链接数为2000,只能创建约667个知识图谱,请问如何解决该问题,因为数据库链接资源有限

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot[bot] commented 1 week ago

要解决在使用MySQL作为HugeGraph存储时,批量创建知识图谱导致数据库连接数超出最大限制的问题,可以调整HugeGraph-Loader的配置参数,特别是与连接数和批量插入相关的参数。

根据文档内容,可以调整以下参数:

{
  "version": "2.0",
  "structs": [
    {
      "id": "1",
      "input": {
        // 其他输入源配置
      },
      "vertices": [
        // 顶点映射配置
      ],
      "edges": [
        // 边映射配置
      ]
    }
  ],
  "options": {
    "max-conn": 4,  // 调整为适合的连接数
    "max-conn-per-route": 2,  // 调整为适合的每个路由的连接数
    "batch-size": 500,  // 调整批量大小
    "timeout": 60,  // 插入结果返回的超时时间(秒)
    "retry-times": 3,  // 发生特定异常时的重试次数
    "retry-interval": 10  // 重试之前的间隔时间(秒)
  }
}

具体调整建议:

  1. max-connmax-conn-per-route:根据MySQL的最大连接数限制,合理设置这两个参数,确保不会超出MySQL的最大连接数。
  2. batch-size:适当减小批量大小,减少每次批量插入的数据量,从而减少对数据库连接的占用时间。
  3. retry-timesretry-interval:设置重试次数和重试间隔时间,以应对偶发的连接超时或失败问题。

此外,确保在HugeGraph的MySQL存储实现中正确管理连接池。例如,在MysqlStore类中,使用MysqlSessions来管理数据库连接,并在适当的时候关闭连接以释放资源:

@Override
public void close() {
    LOG.debug("Store close: {}", this.store);
    this.checkClusterConnected();
    this.sessions.close();
}

通过调整这些参数和确保连接池的正确管理,可以有效控制数据库连接数,避免超出最大限制[1][2]

To continue talking to Dosu, mention @dosu.

imbajin commented 1 week ago

@dosu You should first inquire about the user's usage scenarios and methods. Under normal circumstances, it is not expected to create hundreds of graph instances on a single node, usually in single digits (you can use subgraph partitioning/permission management/tenants, etc.) for resource isolation, rather than creating a large number of graph instances. In short, it feels more like there is a gap in the background area. It is advisable to confirm it before discussing the issue of MySQL connection numbers

dosubot[bot] commented 1 week ago

To address the issue of creating hundreds of graph instances on a single node and the resulting MySQL connection limitations, it's important to understand the user's usage scenarios and methods. Here are some key points:

Usage Scenarios and Methods for Creating Graph Instances

  1. Dynamic Graph Creation:

    • The createGraph method allows for the dynamic creation of graph instances. This method checks if dynamic creation is enabled (ENABLE_DYNAMIC_CREATE_DROP), validates the graph name, and then creates a new graph instance using the provided configuration.
    • Example:

      public HugeGraph createGraph(String name, String configText) {
       E.checkArgument(this.conf.get(ServerOptions.ENABLE_DYNAMIC_CREATE_DROP),
                       "Not allowed to create graph '%s' dynamically, " +
                       "please set `enable_dynamic_create_drop` to true.",
                       name);
       E.checkArgument(StringUtils.isNotEmpty(name),
                       "The graph name can't be null or empty");
       E.checkArgument(!this.graphs().contains(name),
                       "The graph name '%s' has existed", name);
      
       PropertiesConfiguration propConfig = ConfigUtil.buildConfig(configText);
       HugeConfig config = new HugeConfig(propConfig);
       this.checkOptions(config);
      
       return this.createGraph(config, name);
      }
  2. Cloning Graphs:

    • The cloneGraph method allows for cloning an existing graph with a new name and optional configuration changes.
    • Example:

      public HugeGraph cloneGraph(String name, String newName, String configText) {
       HugeGraph cloneGraph = this.graph(name);
       E.checkArgumentNotNull(cloneGraph,
                              "The clone graph '%s' doesn't exist", name);
       E.checkArgument(StringUtils.isNotEmpty(newName),
                       "The graph name can't be null or empty");
       E.checkArgument(!this.graphs().contains(newName),
                       "The graph '%s' has existed", newName);
      
       HugeConfig cloneConfig = cloneGraph.cloneConfig(newName);
       if (StringUtils.isNotEmpty(configText)) {
           PropertiesConfiguration propConfig = ConfigUtil.buildConfig(configText);
           propConfig.getKeys().forEachRemaining(key -> {
               cloneConfig.setProperty(key, propConfig.getProperty(key));
           });
           this.checkOptions(cloneConfig);
       }
      
       return this.createGraph(cloneConfig, newName);
      }
  3. Loading Graphs:

    • The loadGraphs method loads graph configurations from a specified directory and initializes them.
    • Example:
      public void loadGraphs(Map<String, String> graphConfs) {
       for (Map.Entry<String, String> conf : graphConfs.entrySet()) {
           String name = conf.getKey();
           String graphConfPath = conf.getValue();
           HugeFactory.checkGraphName(name, "rest-server.properties");
           try {
               this.loadGraph(name, graphConfPath);
           } catch (Throwable e) {
               LOG.error("Graph '{}' can't be loaded: '{}'", name, graphConfPath, e);
           }
       }
      }

Reasons for Not Using Subgraph Partitioning, Permission Management, or Tenants

  1. Complexity and Overhead:

    • Implementing subgraph partitioning, permission management, or multi-tenancy can add significant complexity and overhead to the system. The current approach of managing multiple graph instances might be simpler and more straightforward for the use cases intended by the developers.
  2. Dynamic Graph Management:

    • The ability to dynamically create, clone, and drop graph instances suggests a flexible and dynamic environment where individual graph instances are managed independently. This might be more aligned with the system's design goals compared to managing subgraphs or tenants.
  3. Resource Isolation:

    • By creating separate graph instances, each graph can have its own configuration and resources, providing a form of isolation without the need for more complex partitioning or multi-tenancy mechanisms.

These points should help clarify the user's usage scenarios and methods, and why they might not be using subgraph partitioning, permission management, or tenants for resource isolation [1].

To continue talking to Dosu, mention @dosu.