Closed tomrijnbeek closed 9 years ago
I find the argument for renaming the classes after the problem they solve quite convincing.
Also, because I tend to not be able to remember all the names of algorithms, while the names of the problems tend to be clearer (since they are rarely named after people).
Regarding multiple algorithms for the same problem, with different constraints, one could give those constraints as parameters to the solver, which then figures out the most efficient algorithm for the specific case. It does make more sense to me to ask the library to 'find me the shortest paths, oh also I have negative edges' then telling it what algorithm to run specifically.
All ACTUAL algorithms could be 'hidden' inside a nested Namespace called Algorithms.Implementations
(or something similar), and the main namespace would only contain problem classes delegating calls to the correct algorithms.
That allows the user to decide what to do, but still keeps it clean, as long as one ignores the extra namespace.
In either case, I would always clearly document what algorithm is used, so that the user (as you say, they are expected to have some skills) can make an informed decision.
I can agree with the solution you come up with, though I am uncertain if Algorithms
is still the right root namespace to use then.
Yeah, I was thinking about that. It feels like it should be something like Problems.Algorithms
... but that does not sound right either.
Given no progress on this so far, I motion to go with Algorithms.Implementations
for the time being.
Problems.Algorithms
is arguably more accurate, but Problems
on the other hand is very vague.
Ah, and as I am writing this, an alternative comes to mind that may work even better:
Algorithms.Concrete
, similar to Collections.Generic
.
With regards to the classes contained in the higher level namespace (proposed to remain Algorithms
), I would go for signatures that allow for calls like (or similar to) this (path finding being only an example):
var parameters = new PathFinding.Parameters
{
AllowNegativeWeights = false,
/* other parameters */
};
var solution = PathFinding.FindAllSingleSource(problem, parameters);
// or (saving space if solution is larger than constant size and algorithm is called repeatedly):
PathFinding.FindAllSingleSource(problem, out solution, parameters);
There would probably be a variety of methods that accept the same problem data structure as input, but their name determines the output they provide. All variations and settings that do not influence the output type should be in the Parameter type I would argue. The static PathFinding
class would then figure out itself what algorithm is most likely the most appropriate given the input and settings.
Also, it seems sensible to think about whether the parameter type is either build into an immutable version before passing it to a solver, or that it's values are copied (in case they have to be used for more than just the initialisation of the algorithm). Doing so would make this somewhat thread safer. On the other hand, one could argue that the problem input would also have to be made a copy off, or be held immutable in some other fashion. It may be easier and even sensible to leave it up to the user not to change the complex input to the algorithm while it is running.
Oh, alternatively (either brain is derp, or I am full of ideas today):
Instead of having a static PathFinding
with a non-static Parameters
class, we could have a regular PathFinder
class, that has both the settings, AND the methods to solve problems with the set settings.
That sounds like a better OO approach. On the other hand, it might invite the user to use the same PathFinder all over the place, which sounds more dangerous to me than just using the same Parameters object.
Thoughts?
I am not too happy with these solutions. I think one of the main usages for this library is a light-weight, easy-to-use set of abilities. If we go with any of your latter suggestions, we would make the usages of algorithms a lot more complicated to the point where they are just as bloated up as using the System.Cryptography
libraries.
The main reason I built a static wrapper for the Hungarian Algorithm is to make the outside interface as simple to use as possible. By extension, I think the interfaces to our algorithms should be as simple as a single double FindMeAPathBetween(double[] matrix, int from, int to, out int[] path)
or void CalculateAllDistancesBetween(double[] matrix, out double[] ds)
.
If the user wants more control or way more advance thing, an algorithms library like ngeneric or QuickGraph could be used. I think this algorithm should characterise itself by keeping things as simple as possible.
That being said, Algorithms.Concrete
is the solution I prefer the most currently.
In principle I am fine with keeping it simple. However, whatever we do has to be flexible enough to actually be useful. I am not convinced that double FindMeAPathBetween(double[] matrix, int from, int to, out int[] path)
is very flexible for example. It requires you to translate your own problem data structure into a specific format (at least O(n) time), to solve a problem on it that might only take O(n log n) time. (Even more importantly, the translation is code the user have to write. In the same time they could just implement the algorithm themselves, in a way that is more appropriate to their exact problem.)
Either way, even if kept simpler (and some of the algorithms will probably not need any parameters), I would still think that my proposed signatures, like var solution = ProblemType.SpecificThingToDo(problem)
are appropriate.
That also seems to fit what you suggest, if I am not mistaken.
In either case the data has to be formatted in a way that is useful. As we don't have a proper graph datastructure, using a distance matrix is the most common way of representing graphs. One could for example write overloads where the user specifies a list of vertex positions and vertex adjacencies (assuming Euclidean distance or allowing custom edge weights), which would either transform the instance to a distance matrix itself (which indeed would require linear time) or be a copy of the algorithm but adapted to use a different underlying representation.
I think the problem of transforming your data to a usable format will always be a problem as long as we don't have standardised datastructures (which I don't think we should have).
I am fine with your proposed signatures. As far as I am aware, that would mean we have something like the following signatures for now (plus some additional overloads not mentioned):
int[] Matching.FindMinimalMatching(double[,] costMatrix)
int[] Matching.FindMinimalMatching(Vector2[] from, Vector2[] to)
double PathFinding.FindShortestPathBetween(double[,] costMatrix, int from, int to, out int[] path, bool hasNegativeEdges = false)
float PathFinding.FindShortestPathBetween(Vector2[] vertices, bool[,] adjacencies, int from, int to, out int[] path)
double[,] PathFinding.FindAllPairDistances(double[,] costMatrix)
float[,] PathFinding.FindAllPairDistances(Vector2[] vertices)
On this note, in my bin packing (see #32) algorithm, I used the following signature.
BinPacking.Result BinPacking.Pack(IEnumerable<BinPackingRectangle> input, bool aParameter)
Result and Rectangle are both (generic) custom types, so I have full control over their behaviour. Of course this means that the user has to translate their own data, which I do not find optimal.
So far I have not yet thought of a better solution. This one is at least type-safe and fast.
Right now the algorithms are called by their actual name (e.g.
HungarianAlgorithm
). The question is whether we should keep this, or if we should actually rename them to the problem they solve (so in this case:MinimalMatching
). The argument for this is that the implementation of this solving algorithm could change. In this particular instance: the performace of the class could be improved by switching to a new algorithm that uses Ford-Fulkerson on a bipartite graph. This implementation should then be hidden for the end user, as having both would not be beneficial under this instance.On the other hand, one could argue that there is value in having multiple shortest path algorithms. Dijkstra can't deal with negative edge weights, but is fast. Bellman-Ford is slower, but is more flexible with the edge weights. Considering the expected experience of the users of this library, both algorithms could be given as option.
Hence the question is whether the classes should be called
MinimalMatching
,SingleSourceShortestPath
andAllPairsShortestPath
, orHungarianAlgorithm
,DijkstrasAlgorithm
andFloydWarshalAlgorithm
.