KeRNeLith / QuikGraph

Generic Graph Data Structures and Algorithms for .NET
https://kernelith.github.io/QuikGraph/
Microsoft Public License
465 stars 70 forks source link

Properly implement the Centrality Approximation algorithm #1

Open KeRNeLith opened 4 years ago

KeRNeLith commented 4 years ago

Implement the Centrality Approximation algorithm and unit test it.

The library had a draft of implementation for this algorithm, here is the basis:

using System;
using System.Collections.Generic;
using System.Linq;
using JetBrains.Annotations;
using QuikGraph.Algorithms.Observers;
using QuikGraph.Algorithms.ShortestPath;
#if SUPPORTS_CRYPTO_RANDOM
using QuikGraph.Utils;
#endif

namespace QuikGraph.Algorithms
{
    /// <summary>
    /// Algorithm that computes the most important vertices in a graph.
    /// </summary>
    /// <typeparam name="TVertex">Vertex type.</typeparam>
    /// <typeparam name="TEdge">Edge type.</typeparam>
#if SUPPORTS_SERIALIZATION
    [Serializable]
#endif
    public sealed class CentralityApproximationAlgorithm<TVertex, TEdge> : AlgorithmBase<IVertexListGraph<TVertex, TEdge>>, IDisposable
        where TEdge : IEdge<TVertex>
    {
        [NotNull]
        private readonly DijkstraShortestPathAlgorithm<TVertex, TEdge> _dijkstra;

        [NotNull]
        private readonly IDisposable _predecessorRecorderSubscription;

        [NotNull]
        private readonly IDictionary<TVertex, double> _centralities = new Dictionary<TVertex, double>();

        /// <summary>
        /// Initializes a new instance of the <see cref="CentralityApproximationAlgorithm{TVertex,TEdge}"/> class.
        /// </summary>
        /// <param name="visitedGraph">Graph to visit.</param>
        /// <param name="distances">Function to compute the distance given an edge.</param>
        public CentralityApproximationAlgorithm(
            [NotNull] IVertexListGraph<TVertex, TEdge> visitedGraph,
            [NotNull] Func<TEdge, double> distances)
            : base(visitedGraph)
        {
            _dijkstra = new DijkstraShortestPathAlgorithm<TVertex, TEdge>(
                VisitedGraph,
                distances,
                DistanceRelaxers.ShortestDistance);
            var predecessorRecorder = new VertexPredecessorRecorderObserver<TVertex, TEdge>();
            _predecessorRecorderSubscription = predecessorRecorder.Attach(_dijkstra);
        }

        /// <summary>
        /// Function to gets the distance of a given edge.
        /// </summary>
        public Func<TEdge, double> Distances => _dijkstra.Weights;

        [NotNull]
        private Random _random =
#if SUPPORTS_CRYPTO_RANDOM
            new CryptoRandom();
#else
            new Random();
#endif

        /// <summary>
        /// Gets or sets the random number generator.
        /// </summary>
        [NotNull]
        public Random Rand
        {
            get => _random;
            set => _random = value ?? throw new ArgumentNullException(nameof(value), "Random number generator cannot be null.");
        }

        private int _maxIterations = 50;

        /// <summary>
        /// Maximum number of iterations.
        /// </summary>
        public int MaxIterationCount
        {
            get => _maxIterations;
            set
            {
                if (value <= 0)
                    throw new ArgumentOutOfRangeException(nameof(value), "Must be positive.");
                _maxIterations = value;
            }
        }

        #region AlgorithmBase<TGraph>

        /// <inheritdoc />
        protected override void Initialize()
        {
            base.Initialize();
            _centralities.Clear();
            foreach (TVertex vertex in VisitedGraph.Vertices)
                _centralities.Add(vertex, 0);
        }

        /// <inheritdoc />
        protected override void InternalCompute()
        {
            if (VisitedGraph.VertexCount == 0)
                return;

            // Compute temporary values
            int n = VisitedGraph.VertexCount;
            for (int i = 0; i < MaxIterationCount; ++i)
            {
                TVertex vertex = RandomGraphFactory.GetVertex(VisitedGraph, Rand);
                _dijkstra.Compute(vertex);

                foreach (TVertex u in VisitedGraph.Vertices)
                {
                    if (_dijkstra.TryGetDistance(u, out double d))
                        _centralities[u] += n * d / (MaxIterationCount * (n - 1));
                }
            }

            // Update
            foreach (TVertex vertex in _centralities.Keys.ToArray())
                _centralities[vertex] = 1.0 / _centralities[vertex];
        }

        #endregion

        #region IDisposable

        /// <inheritdoc />
        public void Dispose()
        {
            _predecessorRecorderSubscription.Dispose();
        }

        #endregion
    }
}
Kemsekov commented 2 years ago

Here is a thing. I used to create a centrality approxmation as follows: 1) Find all strongly-connected components 2) From each of strongly-connected components do following: 1) Take a random node and place cursor (current execution unit) on it 2) Find it's longest path among all it's shortest paths 3) Step into the longest node on K percent of path length to it. Here what it means. Imagine you have a path A->....->B of 10 nodes in total. If your K is 0.9 then you will change your cursor to 9'th node in that path. Btw by doing this you also find eccentricity of a cursor. 4) Do 2 and 3 step on cursor and move it accordingly meanwhile reducing K until you step in the longest path only by one node. 5) When your cursor hit the same node twice (so you stepping in node that have been cursor before) you are node 3) From all results save nodes with lowest eccentricity.

What is the logic of this approach? If you take random node and find it's eccentricity, then direction in which the longest shortest path is looking kinda always directs to a center, so by stepping into longest shortest path you on avarage always find a node with lowest eccentricity.

Why not always step just by one node, but decreasing step size? Because it is simulated annealing approach to cover most of solution space and so approximation like that works the best.

Why do we care about strongly connected components? Because if we randomly choose nodes and do the approximation from them most of the path's they produce will be the same, but different strongly connected components cannot do that. It is just a way to approximate nodes that works the best.

I have an implementation which is bad commented but works fine and with what I wrote above it must make a bit more sense

See FindCenterByApproximation method