Grokking Algorithms by Aditya Y. Bhargava

Problem-solving techniques: If you come across a problem and aren’t sure how to solve it efficiently, try divide and conquer or dynamic programming. Or you may realize there’s no efficient solution, and get an approximate answer using a greedy algorithm instead.

When I want to solve a problem, the two plans of attack I start with are “Can I use a hash table?” and “Can I model this as a graph?”

An algorithm is a set of instructions for accomplishing a task. Every piece of code could be called an algorithm.

Algorithm speed isn’t measured in seconds, but in growth of the number of operations.

Logs are the flip of exponential.

Efficient algorithm optimize for time or space.

Big O notation tells you how fast an algorithm is, i.e. Big O notation counts the number of operations. Big O notation is about the worst-case scenario. For example, suppose you have a list of size n. Simple search needs to check each element, so it will take n operations. The run time in Big O notation is O(n). Where are the seconds? There are none—Big O doesn’t tell you the speed in seconds. Big O notation lets you compare the number of operations. It tells you how fast the algorithm grows.

Constants like are ignored in Big O notation.

Five Big O run times that you’ll encounter a lot, sorted from fastest to slowest: • O(log n), also known as log time. Example: Binary search. • O(n), also known as linear time. Example: Simple search. • O(n * log n). Example: A fast sorting algorithm, like Quicksort. • O(n 2 ). Example: A slow sorting algorithm, like Selection sort. • O(n!). Example: A really slow algorithm, like the Traveling Salesperson.

There are two different types of access: random access and sequential access. Computer memory is like a giant set of drawers.

Sequential access means reading the elements one by one, starting at the first element. Linked lists can only do sequential access. Random access means you can jump directly to the 10th element.

Here are the run times for common operations on arrays and lists. Reading | Insertion | Deletion Array | O(1) | O(n) | O(n) List | O(n) | O(1) | O(1)

Hybrid data structure are can be created with arrays and linked lists as needed.

computer science hats

Recursion is when a function calls itself. Every recursive function has two cases: the base case and the recursive case. The divide-and-conquer uses recursion to solve hard problems. The stack is a simple data structure allows to push an item on top of previous item, and pop the top item is called a stack. Computer uses a stack internally called the call stack. Computer allocates a box of memory for every function call and sets them in the call stack. The call stack can get very large, which takes up a lot of memory. Recursion keeps track of the previous functions returned value, i.e. state.

Divide And Conquer

“Can I solve this if I use divide and conquer?”

D&C works by breaking a problem down into smaller and smaller pieces.

It isn’t a simple algorithm that you can apply to a problem. Instead, it’s a way to think about a problem.

How it works:

Figure out a simple case as the base case.
Figure out how to reduce your problem and get to the base case.

Quicksort uses divide-and-conquer.

Writing a recursive function involving an array, the base case is often an empty array or an array with one element. If you’re stuck, try that first.

Euclid’s algorithm: “If you find the biggest box that will work for this size, that will be the biggest box that will work for the entire farm.”

Functional programming languages like Haskell don’t have loops, so we have to use recursion. Haskell makes heavy use of recursion.

Inductive proofs are one way to prove that your algorithm works. Each inductive proof has two steps: the base case and the inductive case.

We use similar reasoning for Quicksort.

Quicksort is unique because its speed depends on the pivot you choose.

Quicksort is a tricky case. In the worst case, Quicksort takes O(n 2 ) time and in practice because it hits the average case O(n log n).

A hash table has keys and values. A hash function is a function where you put in an input like a string, it will give back a number. Hashes are good for • Modeling relationships from one thing to another thing • Filtering out duplicates • Caching/memorizing data instead of making your server do work

Hash tables have no ordering.

Breadth-first Search

Answers - What’s the shortest path to X?

“shortest path” meant the path with the fewest edges in the graph.

The shortest path doesn’t have to be about physical distance. It can be about minimizing something.

To calculate the shortest path in an unweighted graph, use breadth-first search.

Breadth-first search will find you the path with the fewest segments. This algorithm finds that the shortest route to the to a location in shortest steps regardless of time constraints. This type of problem is called a shortest-path problem.

Abstract data structure: Graphs A graph models a set of connections. For example, suppose you and your friends are playing poker, and you want to model who owes whom money. Here’s how you could say, “Alex owes Rama money.” Graphs are made up of nodes and edges. A node can be directly connected to many other nodes. Those nodes are called its neighbors. Graphs are a way to model how different things are connected to one another.

Breadth-first search is a different kind of search algorithm from Binary search: one that runs on graphs.

It can help answer two types of questions: • Question type 1: Is there a path from node A to node B? • Question type 2: What is the shortest path from node A to node B?

It uses Queue abstract data structure. You need to check people in the order they were added to the search list, so the search list needs to be a queue. Otherwise, you won’t get the shortest path.

In directed graph, the relationship is only one way represented by an arrow heading pointing with whom the relation is with, i.e. A ->B, A has relation with B, but B does not. It can be seen as dependency as well. Undirected graphs don’t have arrows, and the relationship goes both ways.

'''python // Graph implementation in Python with hash table. graph = {} graph[“you”] = [“alice”, “bob”, “claire”]



If task A depends on task B, task A shows up later in the list. This is called a topological sort, and it’s a way to make an ordered list out of a graph.

A tree is a special type of graph, where no edges ever point back.

Dijkstra’s Algorithm

Answers “What’s the fastest path to X?”

The edges of the graph have time constraints, also called weighted graphs.

To calculate the shortest path in a weighted graph, use Dijkstra’s algorithm.

Dijkstra’s algorithm only works with Directed Acyclic Graph (DAG), without a cycle, i.e. A -> B, NOT B -> A.

The key idea behind Dijkstra’s algorithm: Look at the cheapest node on your graph. There is no cheaper way to get to this node!

Negative-weight edges break the algorithm. For negative weighted graphs, Bellman-Ford algorithm is more suited.

Dijkstra’s algorithm has four steps:

Find the cheapest node neighbour. This is the node you can get to in the least amount of weight of the edge like number, time, money etc.
Check whether there’s another cheaper path to that neighbor through it's neighbors. If so, update their costs, update the parent of the node to that whose cost was cheap. The cost of a node is how long it takes to get to that node from the start.
Repeat until you’ve done this for every node in the graph.
Calculate the final path

Implementation: 3 hash tables, one for graph, costs and parents, and one array processed to keep track of all the nodes that has been processed ( or visited ).

Algorithm:

While there is node to process:
Get the closest node from it.
Update the cost for the neighbors of current node.
If any of the neighbors' costs were updated, update the parents node cost as well.
Mark the node as processed.
Repeat at 1.

Greedy Algorithm ( Or Approximation Algorithm)

NP-complete problems have no fast algorithmic solution so we should not waste time trying to find a fast algorithm. For these types of problems we have use approximation algorithms to quickly find an approximate solution. Greedy strategy is a simple approximation problem-solving strategy.

The beauty of greedy algorithms: they’re easy! A greedy algorithm is simple: at each step, pick the optimal move.In this case, each time you pick a class, you pick the class that ends the soonest. In technical terms: at each step you pick the locally optimal solution, and in the end you’re left with the globally optimal solution.

Calculating the exact solution will take too much time, an approximation algorithm will work. Approximation algorithms are judged by • How fast they are • How close they are to the optimal solution

The classroom scheduling problem: Schedule the classes you want to attend to able to attend the most classes. Algorithm

Pick the class that ends the soonest. This is the first class you’ll hold in this classroom.
Now, you have to pick a class that starts after the first class.Again, pick the class that ends the soonest. This is the second class you’ll hold.

The set-covering problem: Select the minimum broadcasting stations to cover maximum radio listeners in the country. Algorithm O(n^2)

Pick the station that covers the most states that haven’t been covered yet. It’s OK if the station covers some states that have been covered already.
Repeat until all the states are covered.

Sets are like lists, except sets can’t have duplicates. We can do some interesting operations on sets, like union, intersection, and difference.

A set union means “combine both sets".
A set intersection means “find the items that show up in both sets".

A set difference means “subtract the items in one set from the items in the other set".


>>> fruits = set([“avocado”, “tomato”, “banana”])
>>> vegetables = set([“beets”, “carrots”, “tomato”])

fruits | vegetables // This is a set union. set([“avocado”, “beets”, “carrots”, “tomato”, “banana”])

fruits & vegetables // This is a set intersection. set([“tomato”])

fruits – vegetables // This is a set difference. set([“avocado”, “banana”]) vegetables – fruits // This is a set difference. set([“beets”, “carrots”])

The traveling salesperson problem: Travel to all cities while traveling the minimum distance. In general, for n items, it will take n! (n factorial) operations to compute the result. So this is O(n!) time, or factorial time. This is one of the unsolved problems in computer science. There’s no fast known algorithm for it, and smart people think it’s impossible to have a smart algorithm for this problem. The best we can do is come up with an approximate(or greedy) solution.

The number of the routes increases exponentially with increase in cities. This is called the factorial function. So 5! = 120. Suppose you have 10 cities. How many possible routes are there? 10! = 3,628,800. You have to calculate over 3 million possible routes for 10 cities. As you can see, the number of possible NP-complete problems routes becomes big very fast! This is why it’s impossible to compute the “correct” solution for the traveling-salesperson problem if you have a large number of cities.



Greedy algorithms don’t always work!
The knapsack problem: You’re a thief who is in the store with a knapsack ( i.e. a bag ) that can hold of limited items, say 21. KG. As a thief you would steal the most expensive items first but doing so might not have maximized the loot. Items A, B and C are worth 10, 15, 11 respectively. But with greedy approach, the thief will take B worth 15, and miss out A and C whose worth would have been higher. So the greedy strategy doesn’t give you the optimal solution here.

NP-complete problems
The traveling-salesperson problem and the set-covering problem both have something in common: you calculate every possible solution and pick the smallest/shortest one. Both of these problems are NP-complete.
The traveling salesperson and the set-covering problem are two examples.

How do you tell if a problem is NP-complete?
Jonah is picking players for his fantasy football team. He has a list of abilities he wants: good quarterback, good running back, good in rain, good under pressure, and so on. He has a list of players, where each player fulfills some abilities.

Jonah needs a team that fulfills all his abilities, and the team size is limited. “Wait a second,” Jonah realizes. “This is a set-covering problem!”

Jonah can use the same approximation algorithm to create his team:
1. Find the player who fulfills the most abilities that haven’t been fulfilled yet.
2. Repeat until the team fulfills all abilities (or you run out of space on the team).

NP-complete problems show up everywhere! It’s nice to know if the problem you’re trying to solve is NP-complete.

NP-complete problems show up everywhere! It’s nice to know if the problem you’re trying to solve is NP-complete. At that point, you can stop trying to solve it perfectly, and solve it using an approximation algorithm instead. But it’s hard to tell if a problem you’re working on is NP-complete. Usually there’s a very small difference between a problem that’s easy to solve and an NP-complete problem. For example, in the previous chapters, I talked a lot about shortest paths. You know how to calculate the shortest way to get from point A to point B.NP-complete problems. But if you want to find the shortest path that connects several points, that’s the traveling-salesperson problem, which is NP-complete. The short answer: there’s no easy way to tell if the problem you’re working on is NP-complete. 

Here are some giveaways:
• Your algorithm runs quickly with a handful of items but really slows down with more items.
• “All combinations of X” usually point to an NP-complete problem.
• Do you have to calculate “every possible version” of X because you can’t break it down into smaller sub-problems? Might be
NP-complete.
• If your problem involves a sequence (such as a sequence of cities, like traveling salesperson), and it’s hard to solve, it might be NP-complete.
• If your problem involves a set (like a set of radio stations) and it’s hard to solve, it might be NP-complete.
• Can you restate your problem as the set-covering problem or the traveling-salesperson problem? Then your problem is definitely
NP-complete.

Review:
• Greedy algorithms optimize locally, hoping to end up with a global optimum.
• NP-complete problems have no known fast solution.
• If you have an NP-complete problem, your best bet is to use an approximation algorithm.
• Greedy algorithms are easy to write and fast to run, so they make
good approximation algorithms.

Dynamic Programming

You learn dynamic programming, a technique to solve a hard problem by breaking it up into sub-problems and solving those subproblems first. • Using examples, you learn to how to come up with a dynamic programming solution to a new problem.

The knapsack problem Let's go back to the thief wants stealing the goods from before. The simplest algorithm is this: you try every possible set of goods and find the set that gives you the most value. This works, but it’s really slow. For 3 items, you have to calculate 8 possible sets. For 4 items, you have to calculate 16 sets. With every item you add, the number of sets you have to calculate doubles! This algorithm takes O(2^n) time, which is very, very slow. That’s impractical for any reasonable number of goods. The approximate solution will be close to the optimal solution, but it may not be the optimal solution. So how do you calculate the optimal solution?

Answer: With dynamic programming! Let’s see how the dynamic-programming algorithm works here.

Dynamic programming tries to maximize "something". Dynamic programming starts by solving sub-problems and builds up to solving the big problem. Dynamic programming starts with a small problem and builds up to the big problem. Dynamic programming is useful when you’re trying to optimize something given a constraint. Dynamic programming only works when each sub-problem is discrete—when it doesn’t depend on other sub-problems. There’s no single formula for calculating a dynamic-programming solution.

It possible that the best solution doesn’t fill the knapsack completely. For the knapsack problem, you’ll start by solving the problem for smaller knapsacks (or “sub-knapsacks”) and then work up to solving the original problem.

The Grid Every dynamic-programming solution involves a grid. The values in the cells are usually what you’re trying to optimize. For the knapsack problem, the values were the value of the goods. Each cell is a sub-problem, so think about how you can divide your problem into subproblems. That will help you figure out what the axes are.

Important point to note: The final solution may not be in the last cell!

Making the grid What does the grid for this problem look like? You need to answer these questions: • What are the values of the cells? • How do you divide this problem into subproblems? • What are the axes of the grid?

Sometimes algorithms aren’t an exact recipe. They’re a framework that you build your idea on top of.

Longest Common Sub-string Problem Longest Common Sub-sequence Problem

k-nearest neighbors TODO

Trees Let’s go back to the binary search example. When a user logs in to Facebook, Facebook has to look through a big array to see if the username exists. We said the fastest way to search through this array is to run binary search. But there’s a problem: every time a new user signs up, you insert their username into the array. Then you have to re-sort the array, because binary search only works with sorted arrays. Wouldn’t it be nice if you could insert the username into the right slot in the array right away, so you don’t have to sort the array afterward? That’s the idea behind the binary search tree data structure.

Binary Tree: For every node, the nodes to its left are smaller in value, and the nodes to the right are larger in value.

Binary search trees have some downsides too: for one thing, you don’t get random access. You can’t say, “Give me the fifth element of this tree.”

There are special binary search trees that balance themselves. One example is the red-black tree.

More-advanced data structures, check these out: • B-trees (Used in databases) • Red-black trees • Heaps • Splay trees

Inverted Indexes: This is a useful data structure: a hash that maps words to places where they appear. It’s commonly used to build search engines.

The Fourier transform algorithm: Simple idea has a lot of use cases, like sound and image processing, etc.. http://mng.bx/874X

Parallel Algorithms The best you can do with a sorting algorithm is roughly O(n log n). It’s well known that you can’t sort an array in O(n) time—unless you use a parallel algorithm! There’s a parallel version of Quicksort that will sort an array in O(n) time.

Parallel algorithms are hard to design. And it’s also hard to make sure they work correctly and to figure out what type of speed boost you’ll see. One thing is for sure—the time gains aren’t linear. So if you have two cores in your laptop instead of one, that almost never means your algorithm will magically run twice as fast. There are a couple of reasons for this:

• Overhead of managing the parallelism—Suppose you have to sort an array of 1,000 items. How do you divide this task among the two cores? Do you give each core 500 items to sort and then merge the two sorted arrays into one big sorted array? Merging the two arrays takes time. • Load balancing—Suppose you have 10 tasks to do, so you give each core 5 tasks. But core A gets all the easy tasks, so it’s done in 10 seconds, whereas core B gets all the hard tasks, so it takes a minute. That means core A was sitting idle for 50 seconds while core B was doing all the work! How do you distribute the work evenly so both cores are working equally hard?

If you’re interested in the theoretical side of performance and scalability, parallel algorithms might be for you!

MapReduce There’s a special type of parallel algorithm that is becoming increasingly popular: the distributed algorithm. It’s fine to run a parallel algorithm on your laptop if you need two to four cores, but what if you need hundreds of cores? Then you can write your algorithm to run across multiple machines. The MapReduce algorithm is a popular distributed algorithm. You can use it through the popular open source tool Apache Hadoop.

Why are distributed algorithms useful? Suppose you have a table with billions or trillions of rows, and you want to run a complicated SQL query on it. You can’t run it on MySQL, because it struggles after a few billion rows. Use MapReduce through Hadoop! Or suppose you have to process a long list of jobs. Each job takes 10 seconds to process, and you need to process 1 million jobs like this. If you do this on one machine, it will take you months! If you could run it across 100 machines, you might be done in a few days.

Distributed algorithms are great when you have a lot of work to do and want to speed up the time required to do it. MapReduce in particular is built up from two simple ideas: the map function and the reduce function.

The map function The map function is simple: it takes an array and applies the same function to each item in the array. For example, here we’re doubling every item in the array:

>>> arr1 = [1, 2, 3, 4, 5]
>>> arr2 = map(lambda x: 2 * x, arr1)
[2, 4, 6, 8, 10]

arr2 now contains [2, 4, 6, 8, 10] —every element in arr1 was doubled! Doubling an element is pretty fast. But suppose you apply a function that takes more time to process. Look at this pseudocode:

>>> arr1 = # A list of URLs
>>> arr2 = map(download_page, arr1)

Here you have a list of URLs, and you want to download each page and store the contents in arr2 . This could take a couple of seconds for each URL. If you had 1,000 URLs, this might take a couple of hours! Wouldn’t it be great if you had 100 machines, and map could automatically spread out the work across all of them? Then you would be downloading 100 pages at a time, and the work would go a lot faster! This is the idea behind the “map” in MapReduce.

The reduce function The reduce function confuses people sometimes. The idea is that you “reduce” a whole list of items down to one item. With map , you go from one array to another. With reduce , you transform an array to a single item. Here’s an example:

>>> arr1 = [1, 2, 3, 4, 5]
>>> reduce(lambda x,y: x+y, arr1)
15

In this case, you sum up all the elements in the array: 1 + 2 + 3 + 4 + 5 = 15 ! I won’t explain reduce in more detail here, because there are plenty of tutorials online.

MapReduce uses these two simple concepts to run queries about data across multiple machines. When you have a large dataset (billions of rows), MapReduce can give you an answer in minutes where a traditional database might take hours.

Probabilistic Algorithms ( for large data sets ) Bloom filters are probabilistic data structures. HyperLogLog approximates the number of unique elements in a set. Just like bloom filters, it won’t give you an exact answer, but it comes very close and uses only a fraction of the memory a task like this would otherwise take.

If you have a lot of data and are satisfied with approximate answers, check out probabilistic algorithms!

The SHA algorithms: Comparing files,Checking passwords, Simhash

Diffie-Hellman key exchange: How do you encrypt a message so it can only be read by the person you sent the message to?

Both parties don’t need to know the cipher. So we don’t have to meet and agree to what the cipher should be.
The encrypted messages are extremely hard to decode.

Diffie-Hellman has two keys: a public key and a private key. The public key is exactly that: public.

Linear Programming: Maximize something given some constraints. Generate optimum value. All the graph algorithms can be done through linear programming instead. Linear programming is a much more general framework, and graph problems are a subset of that. Linear programming uses the Simplex algorithm. It’s a complex algorithm, which is why I didn’t include it in this book. If you’re interested in optimization, look up linear programming!

anitsh / til

Grokking Algorithms by Aditya Y. Bhargava #257