Hello @cthoyt and thank you for making this work public!
I thought to spend a bit of time and compute a report for the graph using Ensmallen, and as I found its results interesting here it is!
Maybe a portion of this may fit nicely in the repository README.
I'll be providing some embeddings and visualization soonish.
Here's the report!
OBO Foundry graph report
The undirected multigraph has 4.83M nodes and 6.70M heterogeneous edges. The graph contains 225 connected components (of which 1 are disconnected nodes), with the largest one containing 4.64M nodes and the smallest one containing a single node.
Degree centrality
The minimum node degree is 1, the maximum node degree is 544.96K, the mode degree is 1, the mean degree is 2.77 and the node degree median is 1.
The nodes with the highest degree centrality are dron:00000027 (degree 544.96K), so:0000704 (degree 115.06K), so:0001217 (degree 100.31K), ncbitaxon:9606 (degree 89.76K) and ncro:0000025 (degree 59.88K).
Edge types
The graph has 979 edge types, of which the 10 most common are rdfs:subClassOf (10.43M edges, 77.83%), http://www.obofoundry.org/ro/ro.owl#has_proper_part (1.09M edges, 8.13%), ro:0002160 (579.72K edges, 4.33%), pr#has_gene_template (200.96K edges, 1.50%), ro:0000087 (112.25K edges, 0.84%), bfo:0000050 (100.46K edges, 0.75%), bfo:0000051 (70.37K edges, 0.53%), ro:0013002 (55.75K edges, 0.42%), iceo:0000051 (49.73K edges, 0.37%) and ro:0013003 (40.67K edges, 0.30%).
Isomorphic edge types
Isomorphic edge types groups are edge types describing exactly the same set of edges. The presence of such duplicated edge types suggests a potential modelling error in the pipeline that has produced this graph. 7 isomorphic edge types groups were detected in this graph.
Isomorphic edge type group containing 2 edge types (2.05K edges, 0.02%), which are: ncro:0004009 and ncro:0004006.
Isomorphic edge type group containing 2 edge types (60 edges), which are: micro:0001521 and micro:0001522.
Isomorphic edge type group containing 2 edge types (60 edges), which are: micro:0001524 and micro:0001523.
Isomorphic edge type group containing 2 edge types (60 edges), which are: micro:0001467 and micro:0001502.
Isomorphic edge type group containing 2 edge types (22 edges), which are: micro:0001219 and micro:0001468.
Isomorphic edge type group containing 2 edge types (10 edges), which are: ro:0015011 and pato#has_cross_section.
Isomorphic edge type group containing 2 edge types (8 edges), which are: ro:0015012 and pato#reciprocal_of.
Singleton edge types
Singleton edge types are edge types that are assigned exclusively to a single edge, making the edge type relatively meaningless, as it adds no more information than the name of edge itself. The graph contains a edge with singleton edge type, which is OGI.owl#hasIntervalRelation.
Topological Oddities
A topological oddity is a set of nodes in the graph that may be derived by an error during the generation of the edge list of the graph and, depending on the task, could bias the results of topology-based models. In the following paragraph, we will describe the detected topological oddities.
Singleton nodes with self-loops
A singleton node with self-loops is a node disconnected from all other nodes except itself. We have detected a single singleton node with self-loops in the graph.
cdao:0000167
Node tuples
A node tuple is a connected component composed of two nodes. We have detected 89 node tuples in the graph, involving a total of 178 nodes and 89 edges. The detected node tuples are:
Node tuple containing the nodes ncro:0004012 and ncro:0004016.
And other 74 node tuples.
Isomorphic node groups
Isomorphic groups are nodes with exactly the same neighbours and node types (if present in the graph). Nodes in such groups are topologically indistinguishable, that is swapping their ID would not change the graph topology. We have detected 2.85K isomorphic node groups in the graph, involving a total of 9.90K nodes (0.21%) and 61.63K edges (0.46%), with the largest one involving 91 nodes and 536 edges. The detected isomorphic node groups, sorted by decreasing size, are:
Group with 27 nodes (degree 6): NCIT_C177739, NCIT_C29338, NCIT_C49063, NCIT_C125692, NCIT_C161778 and other 22.
And other 2.84K isomorphic node groups.
Trees
A tree is a connected component with n nodes and n-1 edges. We have detected 25 trees in the graph, involving a total of 146.33K nodes (3.03%) and 146.31K edges (1.09%), with the largest one involving 107.00K nodes and 107.00K edges. The detected trees, sorted by decreasing size, are:
Tree starting from the root node vto:9022503 (degree 3), and containing 107.00K nodes, with a maximal depth of 28, which are vto:0033480 (degree 14), vto:9022661, vto:9022507, vto:0033071 (degree 41) and vto:9022505 (degree 4). Its edges have a single edge type, which is rdfs:subClassOf.
Tree starting from the root node tto:5 (degree 3), and containing 38.64K nodes, with a maximal depth of 15, which are tto:7 (degree 3), tto:2 (degree 3), tto:6 (degree 3), tto:15 (degree 12) and tto:16. Its edges have a single edge type, which is rdfs:subClassOf.
Tree starting from the root node flopo:0900047 (degree 2), and containing 324 nodes, with a maximal depth of 6, which are flopo:0018579, flopo:0900048, flopo:0000000 (degree 257), flopo:0900049 (degree 3) and flopo:0000100. Its edges have a single edge type, which is rdfs:subClassOf.
Tree starting from the root node apo:0000018 (degree 2), and containing 60 nodes, with a maximal depth of 4, which are apo:0000192 (degree 3), apo:0000164 (degree 3), apo:0000114 (degree 7), apo:0000020 (degree 8) and apo:0000186 (degree 13). Its edges have a single edge type, which is rdfs:subClassOf.
Tree starting from the root node nomen:0000272 (degree 5), and containing 47 nodes, with a maximal depth of 3, which are nomen:0000233 (degree 3), nomen:0000273 (degree 7), nomen:0000289 (degree 3), nomen:0000276 (degree 5) and nomen:0000274. Its edges have a single edge type, which is rdfs:subPropertyOf.
Tree starting from the root node fma:50616 (degree 4), and containing 44 nodes, with a maximal depth of 5, which are fma:50593 (degree 4), fma:50599 (degree 3), fma:50600 (degree 3), fma:64760 and fma:50592 (degree 4). Its edges have a single edge type, which is rdfs:subClassOf.
And other 19 trees.
Dendritic trees
A dendritic tree is a tree-like structure starting from a root node that is part of another strongly connected component. We have detected 13.96K dendritic trees in the graph, involving a total of 2.64M nodes (54.67%) and 2.64M edges (19.71%), with the largest one involving 111.28K nodes and 111.28K edges. The detected dendritic trees, sorted by decreasing size, are:
A star is a tree with a maximal depth of one, where nodes with maximal unique degree one are connected to a central root node with a high degree. We have detected 63 stars in the graph, involving a total of 3.52K nodes (0.07%) and 3.46K edges (0.03%), with the largest one involving 3.12K nodes and 3.12K edges. The detected stars, sorted by decreasing size, are:
Star starting from the root node clo.owl (degree 42), and containing 43 nodes, with a maximal depth of 1, which are :genid2147520550, :genid2147520582, :genid2147520614, :genid2147520556 and _:genid2147520588. Its edges have a single edge type, which is iao:0000412.
Star starting from the root node ido.owl (degree 34), and containing 35 nodes, with a maximal depth of 1, which are :genid2147520598, :genid2147520626, :genid2147520547, :genid2147520611 and _:genid2147520539. Its edges have a single edge type, which is iao:0000412.
Star starting from the root node mfmo:0000005 (degree 22), and containing 23 nodes, with a maximal depth of 1, which are mfmo:0000134, mfmo:0000139, mfmo:0000136, mfmo:0000141 and mfmo:0000195. Its edges have a single edge type, which is rdfs:subClassOf.
Star starting from the root node http://schema.org/Dataset (degree 19), and containing 20 nodes, with a maximal depth of 1, which are pcl:0012454, pcl:0012458, pcl:0016451, pcl:0016453 and pcl:0012457. Its edges have a single edge type, which is rdfs:subClassOf.
And other 57 stars.
Dendritic stars
A dendritic star is a dendritic tree with a maximal depth of one, where nodes with maximal unique degree one are connected to a central root node with high degree and inside a strongly connected component. We have detected 23.67K dendritic stars in the graph, involving a total of 465.05K nodes (9.63%) and 465.05K edges (3.47%), with the largest one involving 59.87K nodes and 59.87K edges. The detected dendritic stars, sorted by decreasing size, are:
Dendritic star starting from the root node ncro:0000025 (degree 59.88K), and containing 59.87K nodes, with a maximal depth of 1, which are omit:0030022, omit:0030054, omit:0030086, omit:0030118 and omit:0030150. Its edges have a single edge type, which is rdfs:subClassOf.
Dendritic star starting from the root node ogg:2060009606 (degree 20.71K), and containing 20.71K nodes, with a maximal depth of 1, which are ogg:3000000027, ogg:3000000103, ogg:3000000140, ogg:3000000174 and ogg:3000000215. Its edges have a single edge type, which is rdfs:subClassOf.
Dendritic star starting from the root node clo:0000374 (degree 21.70K), and containing 15.60K nodes, with a maximal depth of 1, which are clo:0010078, clo:0010105, clo:0010134, clo:0010165 and clo:0010197. Its edges have a single edge type, which is rdfs:subClassOf.
Dendritic star starting from the root node ogg:2070009606 (degree 13.41K), and containing 13.40K nodes, with a maximal depth of 1, which are ogg:3000000064, ogg:3000000442, ogg:3000001498, ogg:3000002229 and ogg:3000003116. Its edges have a single edge type, which is rdfs:subClassOf.
And other 23.67K dendritic stars.
Dendritic tendril stars
A dendritic tendril star is a dendritic tree with a depth greater than one, where the arms of the star are tendrils. We have detected 2.85K dendritic tendril stars in the graph, involving a total of 91.28K nodes (1.89%) and 91.28K edges (0.68%), with the largest one involving 36.09K nodes and 36.09K edges. The detected dendritic tendril stars, sorted by decreasing size, are:
A free-floating chain is a tree with maximal degree two. We have detected 6 free-floating chains in the graph, involving a total of 33 nodes and 27 edges, with the largest one involving 12 nodes and 11 edges. The detected free-floating chains, sorted by decreasing size, are:
Free-floating chain starting from the root node ontoneo:00000025 (degree 10), and containing 12 nodes, with a maximal depth of 2, which are ontoneo:00000033, ontoneo:00000031, ontoneo:00000035, ontoneo:00000038 and ontoneo:00000001. Its edges have a single edge type, which is rdfs:subClassOf.
Free-floating chain starting from the root node OGI.owl#hasOccupiesLocus (degree 3), and containing 5 nodes, with a maximal depth of 2, which are OGI.owl#hasOccupiesSite, OGI.owl#hasLinkedLoci, OGI.owl#hasGeneLocus and OGI.owl#isOccupiedBy. Its edges have 2 edge types, which are rdfs:subPropertyOf (3 edges) and inverseOf (2 edges).
Free-floating chain starting from the root node bfo:0000059 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are bfo:0000058, bfo:0000164 and bfo:0000165. Its edges have 2 edge types, which are rdfs:subPropertyOf (3 edges) and inverseOf.
Free-floating chain starting from the root node cvdo:0000462 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are cvdo:0000463, cvdo:0000015 and cvdo:0000052. Its edges have a single edge type, which is rdfs:subClassOf.
Free-floating chain starting from the root node ro:0009001 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are foodon:00002420, ro:0009005 and foodon:00001563. Its edges have a single edge type, which is rdfs:subPropertyOf.
Free-floating chain starting from the root node ohpi:0000005 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are ohpi:0000004, ohpi:0000007 and ohpi:0000002. Its edges have a single edge type, which is rdfs:subPropertyOf.
Tendrils
A tendril is a path starting from a node of degree one, connected to a strongly connected component. We have detected 30.54K tendrils in the graph, involving a total of 34.20K nodes (0.71%) and 34.20K edges (0.26%), with the largest one involving 6 nodes and 6 edges. The detected tendrils, sorted by decreasing size, are:
Tendril starting from the root node gno:10003645 (degree 3), and containing 6 nodes, with a maximal depth of 6, which are GNO_G43004GV, GNO_G19481JB, GNO_G09629BD, GNO_G23332GX and GNO_G13291LW. Its edges have a single edge type, which is rdfs:subClassOf.
Tendril starting from the root node gno:10005513 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G31762WR, GNO_G10610LV, GNO_G02314TH, GNO_G03321MB and GNO_G68350VT. Its edges have a single edge type, which is rdfs:subClassOf.
Tendril starting from the root node gno:10003083 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G75239VC, GNO_G20760YX, GNO_G80478LI, GNO_G25407XQ and GNO_G15045GC. Its edges have a single edge type, which is rdfs:subClassOf.
Tendril starting from the root node gno:10001521 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G59087DS, GNO_G73923FP, GNO_G48313NQ, GNO_G58953BZ and GNO_G32670UB. Its edges have a single edge type, which is rdfs:subClassOf.
Tendril starting from the root node gno:10008749 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G69267FL, GNO_G79224OO, GNO_G44197ZN, GNO_G46747RO and GNO_G62127RR. Its edges have a single edge type, which is rdfs:subClassOf.
Hello @cthoyt and thank you for making this work public!
I thought to spend a bit of time and compute a report for the graph using Ensmallen, and as I found its results interesting here it is! Maybe a portion of this may fit nicely in the repository README.
I'll be providing some embeddings and visualization soonish.
Here's the report!
OBO Foundry graph report
The undirected multigraph has 4.83M nodes and 6.70M heterogeneous edges. The graph contains 225 connected components (of which 1 are disconnected nodes), with the largest one containing 4.64M nodes and the smallest one containing a single node.
Degree centrality
The minimum node degree is 1, the maximum node degree is 544.96K, the mode degree is 1, the mean degree is 2.77 and the node degree median is 1.
The nodes with the highest degree centrality are dron:00000027 (degree 544.96K), so:0000704 (degree 115.06K), so:0001217 (degree 100.31K), ncbitaxon:9606 (degree 89.76K) and ncro:0000025 (degree 59.88K).
Edge types
The graph has 979 edge types, of which the 10 most common are rdfs:subClassOf (10.43M edges, 77.83%), http://www.obofoundry.org/ro/ro.owl#has_proper_part (1.09M edges, 8.13%), ro:0002160 (579.72K edges, 4.33%), pr#has_gene_template (200.96K edges, 1.50%), ro:0000087 (112.25K edges, 0.84%), bfo:0000050 (100.46K edges, 0.75%), bfo:0000051 (70.37K edges, 0.53%), ro:0013002 (55.75K edges, 0.42%), iceo:0000051 (49.73K edges, 0.37%) and ro:0013003 (40.67K edges, 0.30%).
Isomorphic edge types
Isomorphic edge types groups are edge types describing exactly the same set of edges. The presence of such duplicated edge types suggests a potential modelling error in the pipeline that has produced this graph. 7 isomorphic edge types groups were detected in this graph.
Singleton edge types
Singleton edge types are edge types that are assigned exclusively to a single edge, making the edge type relatively meaningless, as it adds no more information than the name of edge itself. The graph contains a edge with singleton edge type, which is OGI.owl#hasIntervalRelation.
Topological Oddities
A topological oddity is a set of nodes in the graph that may be derived by an error during the generation of the edge list of the graph and, depending on the task, could bias the results of topology-based models. In the following paragraph, we will describe the detected topological oddities.
Singleton nodes with self-loops
A singleton node with self-loops is a node disconnected from all other nodes except itself. We have detected a single singleton node with self-loops in the graph.
Node tuples
A node tuple is a connected component composed of two nodes. We have detected 89 node tuples in the graph, involving a total of 178 nodes and 89 edges. The detected node tuples are:
And other 74 node tuples.
Isomorphic node groups
Isomorphic groups are nodes with exactly the same neighbours and node types (if present in the graph). Nodes in such groups are topologically indistinguishable, that is swapping their ID would not change the graph topology. We have detected 2.85K isomorphic node groups in the graph, involving a total of 9.90K nodes (0.21%) and 61.63K edges (0.46%), with the largest one involving 91 nodes and 536 edges. The detected isomorphic node groups, sorted by decreasing size, are:
And other 2.84K isomorphic node groups.
Trees
A tree is a connected component with n nodes and n-1 edges. We have detected 25 trees in the graph, involving a total of 146.33K nodes (3.03%) and 146.31K edges (1.09%), with the largest one involving 107.00K nodes and 107.00K edges. The detected trees, sorted by decreasing size, are:
And other 19 trees.
Dendritic trees
A dendritic tree is a tree-like structure starting from a root node that is part of another strongly connected component. We have detected 13.96K dendritic trees in the graph, involving a total of 2.64M nodes (54.67%) and 2.64M edges (19.71%), with the largest one involving 111.28K nodes and 111.28K edges. The detected dendritic trees, sorted by decreasing size, are:
And other 13.96K dendritic trees.
Stars
A star is a tree with a maximal depth of one, where nodes with maximal unique degree one are connected to a central root node with a high degree. We have detected 63 stars in the graph, involving a total of 3.52K nodes (0.07%) and 3.46K edges (0.03%), with the largest one involving 3.12K nodes and 3.12K edges. The detected stars, sorted by decreasing size, are:
And other 57 stars.
Dendritic stars
A dendritic star is a dendritic tree with a maximal depth of one, where nodes with maximal unique degree one are connected to a central root node with high degree and inside a strongly connected component. We have detected 23.67K dendritic stars in the graph, involving a total of 465.05K nodes (9.63%) and 465.05K edges (3.47%), with the largest one involving 59.87K nodes and 59.87K edges. The detected dendritic stars, sorted by decreasing size, are:
And other 23.67K dendritic stars.
Dendritic tendril stars
A dendritic tendril star is a dendritic tree with a depth greater than one, where the arms of the star are tendrils. We have detected 2.85K dendritic tendril stars in the graph, involving a total of 91.28K nodes (1.89%) and 91.28K edges (0.68%), with the largest one involving 36.09K nodes and 36.09K edges. The detected dendritic tendril stars, sorted by decreasing size, are:
And other 2.85K dendritic tendril stars.
Free-floating chains
A free-floating chain is a tree with maximal degree two. We have detected 6 free-floating chains in the graph, involving a total of 33 nodes and 27 edges, with the largest one involving 12 nodes and 11 edges. The detected free-floating chains, sorted by decreasing size, are:
Tendrils
A tendril is a path starting from a node of degree one, connected to a strongly connected component. We have detected 30.54K tendrils in the graph, involving a total of 34.20K nodes (0.71%) and 34.20K edges (0.26%), with the largest one involving 6 nodes and 6 edges. The detected tendrils, sorted by decreasing size, are:
And other 30.54K tendrils.