cthoyt / obo-foundry-graph

Demonstrate combining all OBO Foundry ontologies via Bioregistry, Bioontologies, and ROBOT
MIT License
1 stars 0 forks source link

Graph report and description #1

Open LucaCappelletti94 opened 2 years ago

LucaCappelletti94 commented 2 years ago

Hello @cthoyt and thank you for making this work public!

I thought to spend a bit of time and compute a report for the graph using Ensmallen, and as I found its results interesting here it is! Maybe a portion of this may fit nicely in the repository README.

I'll be providing some embeddings and visualization soonish.

Here's the report!

OBO Foundry graph report

The undirected multigraph has 4.83M nodes and 6.70M heterogeneous edges. The graph contains 225 connected components (of which 1 are disconnected nodes), with the largest one containing 4.64M nodes and the smallest one containing a single node.

Degree centrality

The minimum node degree is 1, the maximum node degree is 544.96K, the mode degree is 1, the mean degree is 2.77 and the node degree median is 1.

The nodes with the highest degree centrality are dron:00000027 (degree 544.96K), so:0000704 (degree 115.06K), so:0001217 (degree 100.31K), ncbitaxon:9606 (degree 89.76K) and ncro:0000025 (degree 59.88K).

Edge types

The graph has 979 edge types, of which the 10 most common are rdfs:subClassOf (10.43M edges, 77.83%), http://www.obofoundry.org/ro/ro.owl#has_proper_part (1.09M edges, 8.13%), ro:0002160 (579.72K edges, 4.33%), pr#has_gene_template (200.96K edges, 1.50%), ro:0000087 (112.25K edges, 0.84%), bfo:0000050 (100.46K edges, 0.75%), bfo:0000051 (70.37K edges, 0.53%), ro:0013002 (55.75K edges, 0.42%), iceo:0000051 (49.73K edges, 0.37%) and ro:0013003 (40.67K edges, 0.30%).

Isomorphic edge types

Isomorphic edge types groups are edge types describing exactly the same set of edges. The presence of such duplicated edge types suggests a potential modelling error in the pipeline that has produced this graph. 7 isomorphic edge types groups were detected in this graph.

  1. Isomorphic edge type group containing 2 edge types (2.05K edges, 0.02%), which are: ncro:0004009 and ncro:0004006.
  2. Isomorphic edge type group containing 2 edge types (60 edges), which are: micro:0001521 and micro:0001522.
  3. Isomorphic edge type group containing 2 edge types (60 edges), which are: micro:0001524 and micro:0001523.
  4. Isomorphic edge type group containing 2 edge types (60 edges), which are: micro:0001467 and micro:0001502.
  5. Isomorphic edge type group containing 2 edge types (22 edges), which are: micro:0001219 and micro:0001468.
  6. Isomorphic edge type group containing 2 edge types (10 edges), which are: ro:0015011 and pato#has_cross_section.
  7. Isomorphic edge type group containing 2 edge types (8 edges), which are: ro:0015012 and pato#reciprocal_of.

Singleton edge types

Singleton edge types are edge types that are assigned exclusively to a single edge, making the edge type relatively meaningless, as it adds no more information than the name of edge itself. The graph contains a edge with singleton edge type, which is OGI.owl#hasIntervalRelation.

Topological Oddities

A topological oddity is a set of nodes in the graph that may be derived by an error during the generation of the edge list of the graph and, depending on the task, could bias the results of topology-based models. In the following paragraph, we will describe the detected topological oddities.

Singleton nodes with self-loops

A singleton node with self-loops is a node disconnected from all other nodes except itself. We have detected a single singleton node with self-loops in the graph.

Node tuples

A node tuple is a connected component composed of two nodes. We have detected 89 node tuples in the graph, involving a total of 178 nodes and 89 edges. The detected node tuples are:

And other 74 node tuples.

Isomorphic node groups

Isomorphic groups are nodes with exactly the same neighbours and node types (if present in the graph). Nodes in such groups are topologically indistinguishable, that is swapping their ID would not change the graph topology. We have detected 2.85K isomorphic node groups in the graph, involving a total of 9.90K nodes (0.21%) and 61.63K edges (0.46%), with the largest one involving 91 nodes and 536 edges. The detected isomorphic node groups, sorted by decreasing size, are:

  1. Group with 67 nodes (degree 8): fbbt:20000405, fbbt:20000575, fbbt:20000437, fbbt:20000227, fbbt:20000435 and other 62.
  2. Group with 48 nodes (degree 11): fobi.owl#FOBI:050018, fobi:030371, fobi:030373, fobi.owl#FOBI:050013, fobi:030347 and other 43.
  3. Group with 21 nodes (degree 25): aro:3000600, aro:3000598, aro:3000250, aro:3000498, aro:3000601 and other 16.
  4. Group with 76 nodes (degree 6): fbbt:20002490, fbbt:20002713, fbbt:20002761, fbbt:20002550, fbbt:20002535 and other 71.
  5. Group with 91 nodes (degree 5): pr:000055170, pr:000055172, pr:000055167, pr:000055200, pr:000055302 and other 86.
  6. Group with 71 nodes (degree 6): fbbt:20003407, fbbt:20002961, fbbt:20003028, fbbt:20003284, fbbt:20003412 and other 66.
  7. Group with 33 nodes (degree 11): fobi:030429, fobi:08663, fobi:030382, fobi:030337, fobi:08682 and other 28.
  8. Group with 54 nodes (degree 6): fbbt:20000488, fbbt:20000250, fbbt:20000370, fbbt:20000663, fbbt:20000325 and other 49.
  9. Group with 13 nodes (degree 16): fbbt:20000421, fbbt:20000236, fbbt:20000683, fbbt:20000155, fbbt:20000695 and other 8.
  10. Group with 17 nodes (degree 12): fbbt:20002951, fbbt:20003037, fbbt:20002888, fbbt:20002978, fbbt:20003016 and other 12.
  11. Group with 25 nodes (degree 8): fbbt:20002145, fbbt:20002130, fbbt:20002076, fbbt:20002141, fbbt:20002137 and other 20.
  12. Group with 36 nodes (degree 5): pr:000055284, pr:000055191, pr:000055207, pr:000055273, pr:000055194 and other 31.
  13. Group with 36 nodes (degree 5): pr:000055104, pr:000055083, pr:000055154, pr:000055106, pr:000055157 and other 31.
  14. Group with 33 nodes (degree 5): pr:000055577, pr:000055583, pr:000055575, pr:000055591, pr:000055630 and other 28.
  15. Group with 27 nodes (degree 6): NCIT_C177739, NCIT_C29338, NCIT_C49063, NCIT_C125692, NCIT_C161778 and other 22.

And other 2.84K isomorphic node groups.

Trees

A tree is a connected component with n nodes and n-1 edges. We have detected 25 trees in the graph, involving a total of 146.33K nodes (3.03%) and 146.31K edges (1.09%), with the largest one involving 107.00K nodes and 107.00K edges. The detected trees, sorted by decreasing size, are:

  1. Tree starting from the root node vto:9022503 (degree 3), and containing 107.00K nodes, with a maximal depth of 28, which are vto:0033480 (degree 14), vto:9022661, vto:9022507, vto:0033071 (degree 41) and vto:9022505 (degree 4). Its edges have a single edge type, which is rdfs:subClassOf.
  2. Tree starting from the root node tto:5 (degree 3), and containing 38.64K nodes, with a maximal depth of 15, which are tto:7 (degree 3), tto:2 (degree 3), tto:6 (degree 3), tto:15 (degree 12) and tto:16. Its edges have a single edge type, which is rdfs:subClassOf.
  3. Tree starting from the root node flopo:0900047 (degree 2), and containing 324 nodes, with a maximal depth of 6, which are flopo:0018579, flopo:0900048, flopo:0000000 (degree 257), flopo:0900049 (degree 3) and flopo:0000100. Its edges have a single edge type, which is rdfs:subClassOf.
  4. Tree starting from the root node apo:0000018 (degree 2), and containing 60 nodes, with a maximal depth of 4, which are apo:0000192 (degree 3), apo:0000164 (degree 3), apo:0000114 (degree 7), apo:0000020 (degree 8) and apo:0000186 (degree 13). Its edges have a single edge type, which is rdfs:subClassOf.
  5. Tree starting from the root node nomen:0000272 (degree 5), and containing 47 nodes, with a maximal depth of 3, which are nomen:0000233 (degree 3), nomen:0000273 (degree 7), nomen:0000289 (degree 3), nomen:0000276 (degree 5) and nomen:0000274. Its edges have a single edge type, which is rdfs:subPropertyOf.
  6. Tree starting from the root node fma:50616 (degree 4), and containing 44 nodes, with a maximal depth of 5, which are fma:50593 (degree 4), fma:50599 (degree 3), fma:50600 (degree 3), fma:64760 and fma:50592 (degree 4). Its edges have a single edge type, which is rdfs:subClassOf.

And other 19 trees.

Dendritic trees

A dendritic tree is a tree-like structure starting from a root node that is part of another strongly connected component. We have detected 13.96K dendritic trees in the graph, involving a total of 2.64M nodes (54.67%) and 2.64M edges (19.71%), with the largest one involving 111.28K nodes and 111.28K edges. The detected dendritic trees, sorted by decreasing size, are:

  1. Dendritic tree starting from the root node so:0000704 (degree 115.06K), and containing 111.28K nodes, with a maximal depth of 3, which are ensembl:ENSMUSG00000000085, ensembl:ENSMUSG00000000247, ensembl:ENSMUSG00000000386, ensembl:ENSMUSG00000000605 and ensembl:ENSMUSG00000000792. Its edges have a single edge type, which is rdfs:subClassOf.
  2. Dendritic tree starting from the root node ncbitaxon:7400 (degree 7), and containing 109.94K nodes, with a maximal depth of 9, which are ncbitaxon:1955251 (degree 9), ncbitaxon:27487, ncbitaxon:27483 (degree 4), ncbitaxon:85766 and ncbitaxon:7422 (degree 23). Its edges have a single edge type, which is rdfs:subClassOf.
  3. Dendritic tree starting from the root node ncbitaxon:7148 (degree 13), and containing 76.88K nodes, with a maximal depth of 9, which are ncbitaxon:43785 (degree 3), ncbitaxon:244440, ncbitaxon:43784 (degree 4), ncbitaxon:43783 and ncbitaxon:43789. Its edges have a single edge type, which is rdfs:subClassOf.
  4. Dendritic tree starting from the root node ncbitaxon:2 (degree 324), and containing 66.23K nodes, with a maximal depth of 10, which are foodon:03412927, foodon:03412850, foodon:03412856, foodon:03414574 and foodon:03412855. Its edges have a single edge type, which is rdfs:subClassOf.
  5. Dendritic tree starting from the root node ncbitaxon:41828 (degree 6), and containing 60.57K nodes, with a maximal depth of 8, which are ncbitaxon:1235500, ncbitaxon:41814 (degree 11), ncbitaxon:7149 (degree 12), ncbitaxon:7190 (degree 5) and ncbitaxon:1235491. Its edges have a single edge type, which is rdfs:subClassOf.
  6. Dendritic tree starting from the root node ncbitaxon:7088 (degree 13), and containing 43.90K nodes, with a maximal depth of 5, which are ncbitaxon:500585 (degree 43.70K), ncbitaxon:1479247, ncbitaxon:41024, ncbitaxon:41187 and ncbitaxon:41192. Its edges have a single edge type, which is rdfs:subClassOf.

And other 13.96K dendritic trees.

Stars

A star is a tree with a maximal depth of one, where nodes with maximal unique degree one are connected to a central root node with a high degree. We have detected 63 stars in the graph, involving a total of 3.52K nodes (0.07%) and 3.46K edges (0.03%), with the largest one involving 3.12K nodes and 3.12K edges. The detected stars, sorted by decreasing size, are:

  1. Star starting from the root node http://xmlns.com/foaf/0.1/Image (degree 3.12K), and containing 3.12K nodes, with a maximal depth of 1, which are http://api.hymao.org/api/figure/fig_10815.svg, http://api.hymao.org/api/figure/fig_11301.svg, http://api.hymao.org/api/figure/fig_12508.svg, http://api.hymao.org/api/figure/fig_12635.svg and http://api.hymao.org/api/figure/fig_12673.svg. Its edges have a single edge type, which is rdfs:subClassOf.
  2. Star starting from the root node clo.owl (degree 42), and containing 43 nodes, with a maximal depth of 1, which are :genid2147520550, :genid2147520582, :genid2147520614, :genid2147520556 and _:genid2147520588. Its edges have a single edge type, which is iao:0000412.
  3. Star starting from the root node ido.owl (degree 34), and containing 35 nodes, with a maximal depth of 1, which are :genid2147520598, :genid2147520626, :genid2147520547, :genid2147520611 and _:genid2147520539. Its edges have a single edge type, which is iao:0000412.
  4. Star starting from the root node http://www.biomodels.net/kisao/KISAO#KISAO_0000824 (degree 27), and containing 28 nodes, with a maximal depth of 1, which are http://www.biomodels.net/kisao/KISAO#KISAO_0000840, http://www.biomodels.net/kisao/KISAO#KISAO_0000828, http://www.biomodels.net/kisao/KISAO#KISAO_0000855, http://www.biomodels.net/kisao/KISAO#KISAO_0000844 and http://www.biomodels.net/kisao/KISAO#KISAO_0000859. Its edges have a single edge type, which is rdfs:subClassOf.
  5. Star starting from the root node mfmo:0000005 (degree 22), and containing 23 nodes, with a maximal depth of 1, which are mfmo:0000134, mfmo:0000139, mfmo:0000136, mfmo:0000141 and mfmo:0000195. Its edges have a single edge type, which is rdfs:subClassOf.
  6. Star starting from the root node http://schema.org/Dataset (degree 19), and containing 20 nodes, with a maximal depth of 1, which are pcl:0012454, pcl:0012458, pcl:0016451, pcl:0016453 and pcl:0012457. Its edges have a single edge type, which is rdfs:subClassOf.

And other 57 stars.

Dendritic stars

A dendritic star is a dendritic tree with a maximal depth of one, where nodes with maximal unique degree one are connected to a central root node with high degree and inside a strongly connected component. We have detected 23.67K dendritic stars in the graph, involving a total of 465.05K nodes (9.63%) and 465.05K edges (3.47%), with the largest one involving 59.87K nodes and 59.87K edges. The detected dendritic stars, sorted by decreasing size, are:

  1. Dendritic star starting from the root node ncro:0000025 (degree 59.88K), and containing 59.87K nodes, with a maximal depth of 1, which are omit:0030022, omit:0030054, omit:0030086, omit:0030118 and omit:0030150. Its edges have a single edge type, which is rdfs:subClassOf.
  2. Dendritic star starting from the root node ncbitaxon:185979 (degree 29.79K), and containing 29.78K nodes, with a maximal depth of 1, which are ncbitaxon:1003383, ncbitaxon:1007308, ncbitaxon:1007913, ncbitaxon:1008009 and ncbitaxon:1008038. Its edges have a single edge type, which is rdfs:subClassOf.
  3. Dendritic star starting from the root node ogg:2060009606 (degree 20.71K), and containing 20.71K nodes, with a maximal depth of 1, which are ogg:3000000027, ogg:3000000103, ogg:3000000140, ogg:3000000174 and ogg:3000000215. Its edges have a single edge type, which is rdfs:subClassOf.
  4. Dendritic star starting from the root node ncbitaxon:11520 (degree 18.54K), and containing 18.53K nodes, with a maximal depth of 1, which are ncbitaxon:1001782, ncbitaxon:1001799, ncbitaxon:1031454, ncbitaxon:1031486 and ncbitaxon:1031759. Its edges have a single edge type, which is rdfs:subClassOf.
  5. Dendritic star starting from the root node clo:0000374 (degree 21.70K), and containing 15.60K nodes, with a maximal depth of 1, which are clo:0010078, clo:0010105, clo:0010134, clo:0010165 and clo:0010197. Its edges have a single edge type, which is rdfs:subClassOf.
  6. Dendritic star starting from the root node ogg:2070009606 (degree 13.41K), and containing 13.40K nodes, with a maximal depth of 1, which are ogg:3000000064, ogg:3000000442, ogg:3000001498, ogg:3000002229 and ogg:3000003116. Its edges have a single edge type, which is rdfs:subClassOf.

And other 23.67K dendritic stars.

Dendritic tendril stars

A dendritic tendril star is a dendritic tree with a depth greater than one, where the arms of the star are tendrils. We have detected 2.85K dendritic tendril stars in the graph, involving a total of 91.28K nodes (1.89%) and 91.28K edges (0.68%), with the largest one involving 36.09K nodes and 36.09K edges. The detected dendritic tendril stars, sorted by decreasing size, are:

  1. Dendritic tendril star starting from the root node ncbitaxon:114727 (degree 36.13K), and containing 36.09K nodes, with a maximal depth of 2, which are ncbitaxon:1000293, ncbitaxon:1000322, ncbitaxon:1000351, ncbitaxon:1000403 and ncbitaxon:1001213. Its edges have a single edge type, which is rdfs:subClassOf.
  2. Dendritic tendril star starting from the root node ncbitaxon:196821 (degree 26.98K), and containing 26.98K nodes, with a maximal depth of 2, which are ncbitaxon:1001555, ncbitaxon:1000007, ncbitaxon:1001300, ncbitaxon:1005312 and ncbitaxon:1004552. Its edges have a single edge type, which is rdfs:subClassOf.
  3. Dendritic tendril star starting from the root node ncbitaxon:185978 (degree 4.12K), and containing 4.12K nodes, with a maximal depth of 2, which are ncbitaxon:1005877, ncbitaxon:1005863, ncbitaxon:1008082, ncbitaxon:1010750 and ncbitaxon:1027285. Its edges have a single edge type, which is rdfs:subClassOf.
  4. Dendritic tendril star starting from the root node chebi:50699 (degree 1.94K), and containing 1.87K nodes, with a maximal depth of 2, which are chebi:140827, chebi:146085, chebi:146697, chebi:146852 and chebi:147206. Its edges have a single edge type, which is rdfs:subClassOf.
  5. Dendritic tendril star starting from the root node ncbitaxon:2633828 (degree 612), and containing 611 nodes, with a maximal depth of 2, which are ncbitaxon:103237, ncbitaxon:1042074, ncbitaxon:1042060, ncbitaxon:103236 and ncbitaxon:1042064. Its edges have a single edge type, which is rdfs:subClassOf.
  6. Dendritic tendril star starting from the root node chebi:64674 (degree 529), and containing 463 nodes, with a maximal depth of 3, which are chebi:138531, chebi:136476, chebi:137216, chebi:170433 and chebi:170589. Its edges have a single edge type, which is rdfs:subClassOf.

And other 2.85K dendritic tendril stars.

Free-floating chains

A free-floating chain is a tree with maximal degree two. We have detected 6 free-floating chains in the graph, involving a total of 33 nodes and 27 edges, with the largest one involving 12 nodes and 11 edges. The detected free-floating chains, sorted by decreasing size, are:

  1. Free-floating chain starting from the root node ontoneo:00000025 (degree 10), and containing 12 nodes, with a maximal depth of 2, which are ontoneo:00000033, ontoneo:00000031, ontoneo:00000035, ontoneo:00000038 and ontoneo:00000001. Its edges have a single edge type, which is rdfs:subClassOf.
  2. Free-floating chain starting from the root node OGI.owl#hasOccupiesLocus (degree 3), and containing 5 nodes, with a maximal depth of 2, which are OGI.owl#hasOccupiesSite, OGI.owl#hasLinkedLoci, OGI.owl#hasGeneLocus and OGI.owl#isOccupiedBy. Its edges have 2 edge types, which are rdfs:subPropertyOf (3 edges) and inverseOf (2 edges).
  3. Free-floating chain starting from the root node bfo:0000059 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are bfo:0000058, bfo:0000164 and bfo:0000165. Its edges have 2 edge types, which are rdfs:subPropertyOf (3 edges) and inverseOf.
  4. Free-floating chain starting from the root node cvdo:0000462 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are cvdo:0000463, cvdo:0000015 and cvdo:0000052. Its edges have a single edge type, which is rdfs:subClassOf.
  5. Free-floating chain starting from the root node ro:0009001 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are foodon:00002420, ro:0009005 and foodon:00001563. Its edges have a single edge type, which is rdfs:subPropertyOf.
  6. Free-floating chain starting from the root node ohpi:0000005 (degree 2), and containing 4 nodes, with a maximal depth of 2, which are ohpi:0000004, ohpi:0000007 and ohpi:0000002. Its edges have a single edge type, which is rdfs:subPropertyOf.

Tendrils

A tendril is a path starting from a node of degree one, connected to a strongly connected component. We have detected 30.54K tendrils in the graph, involving a total of 34.20K nodes (0.71%) and 34.20K edges (0.26%), with the largest one involving 6 nodes and 6 edges. The detected tendrils, sorted by decreasing size, are:

  1. Tendril starting from the root node gno:10003645 (degree 3), and containing 6 nodes, with a maximal depth of 6, which are GNO_G43004GV, GNO_G19481JB, GNO_G09629BD, GNO_G23332GX and GNO_G13291LW. Its edges have a single edge type, which is rdfs:subClassOf.
  2. Tendril starting from the root node ncbitaxon:1489388 (degree 11), and containing 5 nodes, with a maximal depth of 5, which are ncbitaxon:1489794, ncbitaxon:1489795, ncbitaxon:130262, ncbitaxon:89577 and ncbitaxon:89578. Its edges have a single edge type, which is rdfs:subClassOf.
  3. Tendril starting from the root node gno:10005513 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G31762WR, GNO_G10610LV, GNO_G02314TH, GNO_G03321MB and GNO_G68350VT. Its edges have a single edge type, which is rdfs:subClassOf.
  4. Tendril starting from the root node gno:10003083 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G75239VC, GNO_G20760YX, GNO_G80478LI, GNO_G25407XQ and GNO_G15045GC. Its edges have a single edge type, which is rdfs:subClassOf.
  5. Tendril starting from the root node gno:10001521 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G59087DS, GNO_G73923FP, GNO_G48313NQ, GNO_G58953BZ and GNO_G32670UB. Its edges have a single edge type, which is rdfs:subClassOf.
  6. Tendril starting from the root node gno:10008749 (degree 3), and containing 5 nodes, with a maximal depth of 5, which are GNO_G69267FL, GNO_G79224OO, GNO_G44197ZN, GNO_G46747RO and GNO_G62127RR. Its edges have a single edge type, which is rdfs:subClassOf.

And other 30.54K tendrils.