KeRNeLith / QuikGraph

Generic Graph Data Structures and Algorithms for .NET
https://kernelith.github.io/QuikGraph/
Microsoft Public License
453 stars 65 forks source link

GraphML Serialization - Add documentation or feature explaining how to control what id values are used #30

Closed KevinSmall closed 2 years ago

KevinSmall commented 3 years ago

I am not sure if this is missing documentation or a feature request. The documentation for GraphML serialization is here: https://github.com/KeRNeLith/QuikGraph/wiki/GraphML-Serialization

Following that documentation, I am able to serialize graphml and add additional attributes as shown below.

However, the id value in the resulting graphml appears to be autogenerated. Is it possible to control what that id value is when serializing to graphml?

In case it is relevant, my goal is to have Gephi consume the grapml produced, and ideally I'd like to use Gephi's merge functionality which lets me split up a large dataset into multiple graphml files and have Gephi import them and merge them - but I need control over the id values during serialization to get this to work.

Sample code:

 using (var xmlWriter = XmlWriter.Create(output))
 {
      graph.SerializeToGraphML<GraphmlVertex, GraphmlEdge, BidirectionalGraph<GraphmlVertex, GraphmlEdge>>(xmlWriter);
 }

public class GraphmlVertex
    {
        [XmlAttribute("address")]
        public string Address { get; set; }

        [XmlAttribute("address-type")]
        public string AddressType { get; set; }
   }

public class GraphmlEdge : Edge<GraphmlVertex>
    {
        [XmlAttribute("transaction-id")]
        public string TransactionId { get; set; }
    }
KeRNeLith commented 3 years ago

Hello @KevinSmall,

First of all thank you for your interest to QuikGraph. The documentation you pointed out is indeed quite not complete regarding this subject. I will update it in order to be clearer.

The id value is indeed autogenerated when you use the serialization method with the minimal number of arguments. Under the hood it is using the AlgorithmExtensions.GetVertexIdentity(g) method to get a way to identify vertices. Of course it's a dummy implementation that will end using some counters stuff like that.

The id is serialized using a string in the end. So the interesting part is that the default behavior is to use counter, but you have access to a version of the SerializeToGraphML method that allows to specify the vertex and edge identity methods which in your case will allow you to control the id generation.

Here is a tiny example to showcase the creation of custom ids:

// NOTE: Node is a class having a X and Y values.
// NOTE 2: Edge is an Edge<Node>
var graph = new BidirectionalGraph<Node, Edge>();
var node1 = new Node(1, 2);
var node2 = new Node(1, 3);
graph.AddVertex(node1);
graph.AddVertex(node2);
graph.AddEdge(new Edge(node1, node2));

// ...
// Serialization with custom vertex identifier and default edge identifier
graph.SerializeToGraphML(
    writer,
    vertex => $"vertex({vertex.X},{vertex.Y})", // <= Notice that
    graph.GetEdgeIdentity());

This results in nodes being generated like this one: <node id="vertex(1,2)"> <data key="X">1</data> <data key="Y">2</data> </node>

Notice the id of the vertex.

KeRNeLith commented 2 years ago

Was it helpful for you @KevinSmall? Is it enough explanations to solve your use cases?

KeRNeLith commented 2 years ago

@KevinSmall I'm closing this support issue for now. Feel free to reopen it if you need further assistance. ;-)