References:

Book: Grokking Algorithms
https://github.com/egonschiele/grokking_algorithms

Sometimes all you need is an algorithm that solves the problem pretty well. And that’s where greedy algorithms shine, because they’re simple to write and usually get pretty close.

1. The set-covering problem

Suppose you’re starting a radio show. You want to reach listeners in all 50 states.

You have to decide what stations to play on to reach all those listeners. It costs money to be on each station, so you’re trying to minimize the number of stations you play on.

You have a list of stations. Each station covers a region, and there’s overlap.

# make a list of the states you want to cover
# You pass an array (list in Python) in, and it gets converted to a set (Sets can’t have duplicates)
states_needed = set(["mt", "wa", "or", "id", "nv", "ut", "ca", "az"])

# You also need the list of stations that you’re choosing from.
# I chose to use a hash (dict in Python) for this:
# The keys are station names, and the values are the states they cover.
stations = {}
stations["kone"] = set(["id", "nv", "ut"])
stations["ktwo"] = set(["wa", "id", "mt"])
stations["kthree"] = set(["or", "nv", "ca"])
stations["kfour"] = set(["nv", "ut"])
stations["kfive"] = set(["ca", "az"])

# Hold the final set of stations
final_stations = set()

# loop until states_needed is empty
while states_needed:
    # You need to go through every station and pick the one that
    # covers the most uncovered states. I’ll call this `best_station`:
    best_station = None
    # states_covered is a set of all the states this station covers.
    states_covered = set()
    # The for loop allows you to loop over every station to see which one is the best station.
    for station, states_for_station in stations.items():
        # `covered` is a set of states that were in both `states_needed` and `states_for_station`.
        # So `covered` is the set of uncovered states that this station covers!
        # `&`: a set intersection.
        covered = states_needed & states_for_station
        # Check whether this station covers more states than the current best_station.
        # If so, this station is the new best_station.
        if len(covered) > len(states_covered):
            best_station = station
            states_covered = covered

    # Need to update states_needed. Because this station covers some states, those states aren’t needed anymore.
    states_needed -= states_covered
    # After the for loop is over, you add `best_station` to the final list of stations:
    final_stations.add(best_station)

print(final_stations)  # {'kone', 'ktwo', 'kthree', 'kfive'}

Generated with https://pythontutor.com

Is that what you expected? Instead of stations 1, 2, 3, and 5, you could have chosen stations 2, 3, 4, and 5.

Let’s compare the run time of the greedy algorithm to the exact algorithm.

Number of Stations	O(n!) Exact Algorithm	O(n²) Greedy Algorithm
5	3.2 sec	2.5 sec
10	102.4 sec	10 sec
32	13.6 yrs	102. 4 sec
100	4×10²¹ yrs	16.67 min

2. Travelling Salesperson Problem

2.1 NP-complete

1×2×3×...×10=>10! = 3,628,800. You have to calculate over 3 million possible routes for 10 cities. The number of possible routes becomes big very fast! This is why it’s almost impossible to compute the “correct” solution for the traveling-salesperson problem if you have a large number of cities.

The traveling-salesperson problem and the set-covering problem both have something in common: you calculate every possible solution and pick the smallest/shortest one. Both of these problems are NP-complete (NP is short for Nondeterministic Polynomial).

Here’s the short explanation of NP-completeness: some problems are famously hard to solve. The traveling salesperson and the set-covering problem are two examples. A lot of smart people think that it’s not possible to write an algorithm that will solve these problems quickly.

Algorithms:

Longitude sorting
Latitude sorting
Nearest neighbor (is a greedy algorithm)
Greedy algorithm
Brute Force algorithm

Nearest Neighbor Algorithm vs Greedy Algorithm:

The Nearest Neighbor algorithm is a type of greedy algorithm, specifically tailored for problems like the Traveling Salesperson Problem (TSP) or similar routing and pathfinding problems.

The Nearest Neighbor algorithm always picks the nearest unvisited city as the next step.
The Greedy algorithm looks for the most cost-effective option at every step without considering the overall route.

Imagine a TSP with cities A, B, C, D, and E, where the distances are not uniform (e.g., traveling from B to C might be shorter than from A to B, but going from C to D afterward might be disproportionately longer).

Nearest Neighbor might choose the path A → B → C → D → E based solely on immediate proximity at each step.

Greedy Algorithm might choose a different path if it considers the overall cost of segments of the journey (like A → C → B → E → D) based on a different set of criteria or a more complex understanding of "cost."

2.2 Use greedy algorithms to solve TSP

Using greedy algorithms to solve the Traveling Salesperson Problem: arbitrarily pick a start city. Then, each time the salesperson has to pick the next city to visit, they pick the closest unvisited city.

Source: https://id.iste.org/files/ect-traveling-salesperson.html

References:

""" Greedy Approach:
1. First of all, we have to create two primary data holders.
    - First of them is a list that can hold the indices (indexes) of the cities
        in terms of the input matrix of distances between cities.
    - And the Second one is the array which is our result.
2. Perform traversal on the given adjacency matrix tsp[][] for all the city and
    if the cost of reaching any city from the current city is less than the current cost
    then update the cost.
3. Generate the minimum path using the above step and return their minimum cost. """

def find_min_route(tsp_matrix: list, cities: list) -> None:
    """ Use Greedy Algorithms
    to find the minimum cost path from all the paths """
    sum_path: int = 0
    counter: int = 0
    i: int = 0
    j: int = 0
    min_path: int = 999999999
    visited_route: list = []
    # Starting from the 0th indexed city i.e., the first city
    visited_route.append(0)
    route: list = [0] * len(tsp_matrix)
    #  Traverse the adjacency matrix tsp_matrix[i][j]
    while i < len(tsp_matrix) and j < len(tsp_matrix[i]):
        if counter >= len(tsp_matrix[i]) - 1:
            break
        # If this path is unvisited and the cost is less,
        # then update the minimum cost path.
        if (j != i) and (j not in visited_route):
            if tsp_matrix[i][j] < min_path:
                min_path = tsp_matrix[i][j]
                route[counter] = j + 1
        j += 1
        # Check all paths from the ith indexed city
        if j == len(tsp_matrix[i]):
            sum_path += min_path
            min_path = 999999999
            # visited_route[route[counter] - 1] = 1
            visited_route.append(route[counter] - 1)
            j = 0
            i = route[counter] - 1
            counter += 1

    print('The total distance is', sum_path, 'Kilometers.')

    # Output the minimum cost route:
    # print(visited_route)  # [0, 4, 3, 5, 6, 8, 7, 2, 1]
    min_route: str = ''
    for index, element in enumerate(visited_route):
        if index == len(visited_route) - 1:
            min_route += f'{cities[element]}'
        else:
            min_route += f'{cities[element]} -> '
    print(f'The minimum cost route is:\n{min_route}')

if __name__ == "__main__":
    # Cities in South Africa
    cities: list = [
        'A: Cape Town', 'B: Bhisho', 'C: Pietermaritzburg',
        'D: Bloemfontein', 'E: Kimberley', 'F: Johannesburg',
        'G: Mahikeng', 'H: Nelspruit', 'I: Polokwane'
    ]
    # The graph distance matrix in Kilometers of the Travelling Salesman
    # From city A: Cape Town -> city I: Polokwane
    tsp_matrix = [[0, 991, 1558, 1004, 941, 1398, 1318, 1743, 1719],
                  [991, 0, 590, 557, 659, 890, 966, 1198, 1241],
                  [1558, 590, 0, 556, 714, 490, 746, 616, 796],
                  [1004, 557, 556, 0, 165, 397, 445, 743, 718],
                  [941, 659, 714, 165, 0, 478, 365, 814, 789],
                  [1398, 890, 490, 397, 478, 0, 291, 327, 331],
                  [1318, 966, 746, 445, 365, 291, 0, 632, 545],
                  [1743, 1198, 616, 743, 814, 327, 632, 0, 301],
                  [1719, 1241, 796, 718, 789, 331, 545, 301, 0]]

    find_min_route(tsp_matrix, cities)

Output:

The total distance is 3846 Kilometers.
The minimum cost route is:
A: Cape Town -> E: Kimberley -> D: Bloemfontein -> F: Johannesburg -> G: Mahikeng -> I: Polokwane -> H: Nelspruit -> C: Pietermaritzburg -> B: Bhisho

2.3 TSP with Brute Force Algorithm

The Brute Force Algorithm will generate all the possible paths between all the cities, then calculate the distance of each path and select the shortest.

import itertools

cities = "ABCDEFGHI"
distances = [[0, 991, 1558, 1004, 941, 1398, 1318, 1743, 1719],
             [991, 0, 590, 557, 659, 890, 966, 1198, 1241],
             [1558, 590, 0, 556, 714, 490, 746, 616, 796],
             [1004, 557, 556, 0, 165, 397, 445, 743, 718],
             [941, 659, 714, 165, 0, 478, 365, 814, 789],
             [1398, 890, 490, 397, 478, 0, 291, 327, 331],
             [1318, 966, 746, 445, 365, 291, 0, 632, 545],
             [1743, 1198, 616, 743, 814, 327, 632, 0, 301],
             [1719, 1241, 796, 718, 789, 331, 545, 301, 0]]

all_paths = list(itertools.permutations(sorted(cities[1:])))

min = 999999999
min2 = None
for p in all_paths:
    path = list(p)
    path.insert(0, "A")
    path = tuple(path)
    # If print here, the program will take time because the number of
    # paths is enormous. The execution time can be up to 30 seconds.
    # print(path, end=' ')

    sum = 0
    for x in range(len(path) - 1):
        sum += distances[cities.find(path[x])][cities.find(path[x + 1])]

    # print("distance =", sum)
    if min > sum:
        min2 = min
        min = sum
        min_path = path
    elif (min2 is None) or (min2 > sum):
        min2 = sum
        min2_path = path

print("Number of paths:", len(all_paths))  # 40320 (len(cities)-1)!
print('The smallest distance:', min)  # 3586
print('The shortest path:\n', min_path)
# ('A', 'B', 'C', 'D', 'E', 'G', 'F', 'H', 'I')
print('The second smallest distance:', min2)  # 3590
print('The second shortest path:\n', min2_path)
# ('A', 'B', 'C', 'D', 'E', 'G', 'F', 'I', 'H')

Qingquan-Li / blog