apache / incubator-hugegraph-toolchain

HugeGraph toolchain - include a series useful graph modules
https://hugegraph.apache.org/
Apache License 2.0
82 stars 87 forks source link

[Question] Import data with Loader in RocksDB & MySQL(meet error) #569

Closed SkrDrag closed 4 months ago

SkrDrag commented 5 months ago

Problem Type (问题类型)

None

Before submit

Environment (环境信息)

Your Question (问题描述)

我一开始选择的后端是Rocksdb,且数据库和Server在同一台机器上。数据导入时没有问题。 之后我选择在另一台机器上存储后端数据库mysql,通过配置Server远程连接mysql数据库,初始化成功并成功启动Server。我在存储数据库的机器上使用loader导入数据,报错

运行脚本: bin/hugegraph-loader.sh -g hugegraph -f /home/llw/dataset/struct.json -s /home/llw/dataset/schema.groovy -h

210.xx.xx.72 
日志:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/llw/apache-hugegraph-toolchain-incubating-1.2.0/apache-hugegraph-loader-incubating-1.2.0/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/llw/apache-hugegraph-toolchain-incubating-1.2.0/apache-hugegraph-loader-incubating-1.2.0/lib/apache-hugegraph-loader-incubating-1.2.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/llw/apache-hugegraph-toolchain-incubating-1.2.0/apache-hugegraph-loader-incubating-1.2.0/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/home/llw/apache-hugegraph-toolchain-incubating-1.2.0/apache-hugegraph-loader-incubating-1.2.0/lib/apache-hugegraph-loader-incubating-1.2.0-shaded.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Failed to load, cause: Not all index fields '[id]' are contained in schema properties '[ID, CODE, NAME, TYPE, length]'

我的问题: 1.我期望Server和后端数据库在两个不同的结点上,如何使用loader向数据库中导入数据,正确的步骤是什么。 2.报这个错误是因为映射文件错误吗?应该如何修改?

schema.groovy:
schema.propertyKey("ID").asText().ifNotExist().create();
schema.propertyKey("CODE").asText().ifNotExist().create();
schema.propertyKey("NAME").asText().ifNotExist().create();
schema.propertyKey("TYPE").asText().ifNotExist().create();

schema.propertyKey("id").asText().ifNotExist().create();
schema.propertyKey("code").asText().ifNotExist().create();
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("type").asText().ifNotExist().create();
schema.propertyKey("aid").asText().ifNotExist().create();
schema.propertyKey("zid").asText().ifNotExist().create();
schema.propertyKey("length").asFloat().ifNotExist().create();

schema.vertexLabel("device")
      .properties("ID","CODE","NAME","TYPE")
      .primaryKeys("ID")
      .nullableKeys("NAME")
      .ifNotExist()
      .create();

schema.edgeLabel("link")
      .sourceLabel("device")
      .targetLabel("device")
      .properties("id","code","name","type","length")
      .nullableKeys("code","name","type")
      .ifNotExist()
      .create();

schema.indexLabel("personByCityAndAge").
      onE("link").
      by("length").
      range().
      ifNotExist()
      .create();

schema.indexLabel("deviceByType")
      .onV("device")
      .by("TYPE")
      .secondary()
      .ifNotExist()
      .create();

schema.indexLabel("linkByType")
      .onE("link")
      .by("id")
      .secondary()
      .ifNotExist()
      .create();

struct .json:

{
  "vertices": [
    {
      "label": "device",
      "input": {
        "type": "file",
        "path": "/home/llw/dataset/dev.csv",
        "header": ["ID","CODE","NAME","TYPE"],
        "format": "CSV",
        "charset": "UTF-8"
      }
    }
  ],
  "edges": [
    {
      "label": "link",
      "source": ["aid"],
      "target": ["zid"],
      "input": {
        "type": "file",
        "path": "/home/llw/dataset/link.csv",
        "format": "CSV",
        "header":["id","code","name","type","aid","zid","length"]
      },
      "field_mapping": {
        "aid": "ID",
        "zid": "ID"
      }
    },
    {
      "label": "link",
      "source": ["zid"],
      "target": ["aid"],
      "input": {
        "type": "file",
        "path": "/home/llw/dataset/link.csv",
        "format": "CSV",
        "header":["id","code","name","type","aid","zid","length"]
      },
      "field_mapping": {
        "aid": "ID",
        "zid": "ID"
      }
    }
  ]

}

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

dosubot[bot] commented 5 months ago

🤖

你好 @SkrDrag,我是 Dosu,一个可以帮助你解决bug,回答问题,以及成为贡献者的机器人。虽然我是一个机器人,但我在等待人类维护者的同时,我会尽我所能帮助你。希望我们能愉快的合作!

根据您提供的信息,我认为问题可能出在索引字段和模式属性之间的不匹配上。错误信息 "Failed to load, cause: Not all index fields '[id]' are contained in schema properties '[ID, CODE, NAME, TYPE, length]'" 表示您尝试导入的数据有一个字段 'id',但在模式属性中并未定义。

要解决这个问题,您需要确保在HugeGraph实例中定义的模式与您尝试导入的数据匹配。模式定义了数据的结构,包括顶点和边的属性。在您的情况下,模式应包括 'id' 字段。

这是一个如何定义模式的例子:

SchemaManager schema = graph.schema();

schema.propertyKey("id").asInt().ifNotExist().create();
schema.propertyKey("CODE").asText().ifNotExist().create();
schema.propertyKey("NAME").asText().ifNotExist().create();
schema.propertyKey("TYPE").asText().ifNotExist().create();
schema.propertyKey("length").asInt().ifNotExist().create();

schema.vertexLabel("vertex")
      .properties("id", "CODE", "NAME", "TYPE", "length")
      .primaryKeys("id")
      .ifNotExist()
      .create();

在这个例子中,创建了一个顶点标签 'vertex',其属性为 'id', 'CODE', 'NAME', 'TYPE', 和 'length'。'id' 属性也被设置为主键。

定义模式后,您可以使用 HugeGraph-Loader 来导入数据。确保您导入的数据与您定义的模式匹配。

如果服务器和后端数据库在两个不同的节点上,您需要在 HugeGraph-Loader 配置中指定后端数据库的 IP 地址或主机名。例如,如果您使用的是 Cassandra 后端,您需要将 'cassandra.host' 属性设置为 Cassandra 节点的 IP 地址或主机名:

cassandra.host=<Cassandra节点的IP地址或主机名>

同样,对于其他后端如 RocksDB、HBase、MySQL、PostgreSQL、CockroachDB 或 Palo,您需要将相应的主机属性设置为后端节点的 IP 地址或主机名。

参考资料:

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

imbajin commented 5 months ago

@dosu-bot 这是一个 toolchain 仓库 loader 模块的问题, 请你结合当前仓库的代码/ issue / 文档上下文来重新进行回答, 希望以后类似的问题, 你可以从 server 仓库感知到提醒用户/社区管理员进行 transfer, 若暂不支持, 这也是一个未来可以实现的 feature~

文档可以参考:

  1. https://github.com/apache/incubator-hugegraph-doc/blob/master/content/cn/docs/quickstart/hugegraph-loader.md
  2. https://hugegraph.apache.org/docs/quickstart/hugegraph-loader/

另外 @SkrDrag ,感谢你的关注和提问, 也感谢你愿意和我们新引入的 LLM 🤖 进行对话, 欢迎继续使用, 有一点小建议:

  1. 提交 issue 的时候选择正确的分类, Question 请不用 Bug 分类提交, 这会引入不正确的 label, 需要社区人工修改
  2. 保留 [Question] 这个分类 title 前缀, 不要删除它, 尽可能用英文标题, 这样方便其他人搜索/查询 (内容如果方便的话用英文当然更好, 如果觉得麻烦可以选择自己习惯的语言)
  3. 回复 dosu 机器人的时候可以 @ 一下它, 就像我上面那样, 这样可能它感知和回复的语言更加一致, 你可以给更多的描述词/指令/refer 帮助它更好的学习和寻找答案
  4. 贴错误和命令/日志/schema 的时候, 如果比较长建议用 ```java 这样的代码块标记包围(单行/多行), 这样阅读体验对大家都会好很多, 也会有更多的同学愿意帮助检查确认(如图) image

最后回到当前的问题, 报错提示已经比较清楚的告诉了原因, 不知道是否有做相关的排查和确认?

image

如果你不熟悉可以查阅一下官方文档, 以及 loader 目录 example 文件夹中有对应本地/hdfs/多个文件的示例, 你可以先从示例开始熟悉 schema 定义/映射, 也可以召唤🤖提供你对应的目录/代码样本

dosubot[bot] commented 5 months ago

🤖

It seems like you are encountering an issue while using the HugeGraph Loader to import data into a MySQL backend after switching from RocksDB. The error message indicates that not all index fields are contained in the schema properties.

Based on the error message and the provided schema and data files, it seems that the issue might be related to the field mapping in the struct.json file. The error message suggests that the index field 'id' is not contained in the schema properties '[ID, CODE, NAME, TYPE, length]'.

To address your questions:

  1. To use the loader to import data into a MySQL backend on a different node from the server, you need to ensure that the server is properly configured to connect to the remote MySQL database. Additionally, you need to provide the correct connection details in the loader command, including the host, port, username, and password. It's also important to ensure that the schema and data files are accessible from the machine where the loader is being executed.

  2. The error is likely related to the field mapping in the struct.json file. You should review the field mapping for the 'link' edge label and ensure that the index field 'id' is correctly mapped to the schema properties.

Here are a few steps you can take to troubleshoot the issue:

If the issue persists, please provide more details about the field mapping in the struct.json file and any relevant error messages, and I can assist you further.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

SkrDrag commented 5 months ago

@dosu-bot 这是一个 toolchain 仓库 loader 模块的问题, 请你结合当前仓库的代码/ issue / 文档上下文来重新进行回答, 希望以后类似的问题, 你可以从 server 仓库感知到提醒用户/社区管理员进行 transfer, 若暂不支持, 这也是一个未来可以实现的 feature~

文档可以参考:

  1. https://github.com/apache/incubator-hugegraph-doc/blob/master/content/cn/docs/quickstart/hugegraph-loader.md
  2. https://hugegraph.apache.org/docs/quickstart/hugegraph-loader/

另外 @SkrDrag ,感谢你的关注和提问, 也感谢你愿意和我们新引入的 LLM 🤖 进行对话, 欢迎继续使用, 有一点小建议:

  1. 提交 issue 的时候选择正确的分类, Question 请不用 Bug 分类提交, 这会引入不正确的 label, 需要社区人工修改
  2. 保留 [Question] 这个分类 title 前缀, 不要删除它, 尽可能用英文标题, 这样方便其他人搜索/查询 (内容如果方便的话用英文当然更好, 如果觉得麻烦可以选择自己习惯的语言)
  3. 回复 dosu 机器人的时候可以 @ 一下它, 就像我上面那样, 这样可能它感知和回复的语言更加一致, 你可以给更多的描述词/指令/refer 帮助它更好的学习和寻找答案
  4. 贴错误和命令/日志/schema 的时候, 如果比较长建议用 ```java 这样的代码块标记包围(单行/多行), 这样阅读体验对大家都会好很多, 也会有更多的同学愿意帮助检查确认(如图)
image

最后回到当前的问题, 报错提示已经比较清楚的告诉了原因, 不知道是否有做相关的排查和确认?

image

如果你不熟悉可以查阅一下官方文档, 以及 loader 目录 example 文件夹中有对应本地/hdfs/多个文件的示例, 你可以先从示例开始熟悉 schema 定义/映射, 也可以召唤🤖提供你对应的目录/代码样本

明白!感谢您的回复!

dosubot[bot] commented 5 months ago

🤖

您好!根据您提供的信息,我了解到您遇到了关于loader模块的问题,并且希望我能结合当前仓库的代码、issue和文档上下文来重新进行回答。同时,您也提到了希望我能够从server仓库感知到提醒用户/社区管理员进行transfer,这是一个未来可以实现的 feature。此外,您还提到了一些关于提交 issue、回复 dosu 机器人、贴错误和命令/日志/schema 的建议,非常感谢您的详细说明!

根据您提供的链接,我会尽快查阅官方文档以及loader目录中的示例文件夹,以便更好地理解您遇到的问题。如果您有任何其他信息或者具体的问题描述,也欢迎随时告诉我,我会尽力帮助您解决问题。


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

github-actions[bot] commented 5 months ago

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label