alibaba / GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
https://graphscope.io
Apache License 2.0
3.26k stars 442 forks source link

[BUG] 为什么add_vertices中的vid_field无法起作用,add_edges中的src_field和dst_field也是无法起效。 #3704

Open yimijiu123 opened 5 months ago

yimijiu123 commented 5 months ago

Describe the bug 对于vid_field我的理解是,可以指定导入数据中的哪一列为点的ID列,用于后续载入边的对应索引属性。 在实际中,即使指定了vid_field的列,但是还是默认点数据集的第一列为ID列。比如: `

sess = get_default_session()

string点

graph = sess.g(oid_type="string") id = np.array(['1', '2', '3', '4']) idd = np.array(["a", "b", "c", "d"]) avg_score = np.array([11, 22, 23, 9]) v_data = np.transpose(np.vstack([idd,id, avg_score])) df_student = pd.DataFrame(v_data, columns=["idd","id", "avg_score"]) src_id = np.array(['1', '2', '3', '1']) dst_id = np.array(['2', '4', '2', '4']) group_size = np.array([4,1,2,3]) e_data = np.transpose(np.vstack([src_id, dst_id, group_size])) df_group = pd.DataFrame(e_data, columns=["src_id", "dst_id", "group_size"]).astype({"group_size": int}) graph = graph.add_vertices(df_student,label="student",vid_field="id") graph=graph.add_edges(df_group,label="guide",src_label="student",dst_label="student") pg = graph.project(vertices={"student": ["id"]}, edges={"guide": ["group_size"]}) `

输出结果为: 1712828891851

问题:

  1. 可以看出vid_field="id"没有起效,也试过vid_field="1"、vid_field=id、vid_field=1,都不对。类似的,设置src_field=1,dst_field=0也不起效,只能交换边数据集的第一二列才可以实现交换出发点和目标点。
  2. 不起效的原因是否是因为设置了: `

    static constexpr int id_column = 0; static constexpr int src_column = 0; static constexpr int dst_column = 1; static constexpr int edge_id_column = 2; `

  3. types_pb2.SRC_VID、types_pb2.SRC_LABEL、types_pb2.V_LABEL_ID、types_pb2.VID、types_pb2.LABEL与C++中的是否对应。比如:GetGid(fid_t fid, label_id_t label_id, oid_t oid, vid_t& gid)对应的是什么?为什么chunk.attr[types_pb2.VID].CopyFrom(utils.s_to_attr(str(self.vid_field)))?
siyuan0322 commented 5 months ago

Thanks for reporting. It is a bug indeed. There was some logic to rearrange the column of the dataframe accordingly, but that piece of code maybe lost during massive refactor of loader 😢

These types_pb2.* is for carrying those meta information from python to C++, which is the CopyFrom statement is used for.

the GetGid is a method, the label_id, oid, gid is replaced by actually value of the vertex, the meaning is not related to the notions above.