TheAlgorithms / Python

All Algorithms implemented in Python
https://thealgorithms.github.io/Python/
MIT License
193.85k stars 45.59k forks source link

Speedup our eight slowest pytests (one at a time please) #9718

Closed cclauss closed 1 year ago

cclauss commented 1 year ago

Feature description

At the end of our GitHub Actions build jobs, there is a list of the slowest ones.

Are there ways to speed up these tests without reducing our functionality or our code coverage?

Please only fix one algorithm per pull request.

Also, those 25 pytest warnings are worth fixing!!!

tianyizheng02 commented 1 year ago

I pointed out in #8785 that neural_network/simple_neural_network.py has multiple issues (I don't think it even implements a neural network) and could do with a rewrite. Ideally whatever new implementation replaces it should run a lot more quickly.

Muhammadummerr commented 1 year ago

I checked the maths/prime_numbers.py file. In this file, there are three methods for finding prime numbers using different techniques. The simplest method, referred to as slow_prime (which uses a loop to iterate from 2 to n and print prime numbers), is taking a lot of time. Do we need it there? Since the other two methods are faster with just minor modifications for finding prime numbers, do we still need the slow_prime method in there?

cclauss commented 1 year ago

Can slow_prime() be tested with smaller numbers? We do not want to get rid of it but we want its test to finish faster. https://github.com/TheAlgorithms/Python/blob/87494f1fa1022368d154477bdc035fd01f9e4382/maths/prime_numbers.py#L20-L21 Let's lower the 10_000 to 1_000

Muhammadummerr commented 1 year ago

If 'slow_prime()' is tested on small test cases, then the other two methods will also be tested on small test cases to compare the time taken among them.

Muhammadummerr commented 1 year ago

If I reduce the test cases, do I need to raise a PR to check whether they finished faster, or is there any other method I can use to verify if the issue is resolved before submitting a PR?

cclauss commented 1 year ago

When things are run on CI platforms like GitHub Actions then an environment variable CI is defined.

Can you use https://docs.python.org/3/library/os.html#os.getenv to see if CI is defined? If CI is defined then set the number to 1_000 for all three tests and it if is not defined then set it to 10_000 for all three.

Muhammadummerr commented 1 year ago

Okay, thanks. Let me try.

Muhammadummerr commented 1 year ago

But if it is not defined, setting the test case to 10_000 will take the same amount of time as before, doesn't it?

cclauss commented 1 year ago

Yes, but we do not care because it no longer slows down our build process.

Muhammadummerr commented 1 year ago

I sped it up in #9851.

Muhammadummerr commented 1 year ago

I was reviewing the backtracking/power_sum.py code, and I noticed that the main issue causing slow execution is the large value of 'needed_sum' parameter.To improve performance, is it correct to reduce the 'needed_sum' value?

cclauss commented 1 year ago

@duongoku Would you be willing to advise @Muhammadummerr on the best way to speed up the slow doctests in backtracking/power_sum.py ?

Muhammadummerr commented 1 year ago

I would love to hear from @duongoku.

quant12345 commented 1 year ago

@cclauss tried backtracking/word_search.py to measure the execution time of each option:

code: called each option 10 times ``` """ Author : Alexander Pantyukhin Date : November 24, 2022 Task: Given an m x n grid of characters board and a string word, return true if word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where adjacent cells are horizontally or vertically neighboring. The same letter cell may not be used more than once. Example: Matrix: --------- |A|B|C|E| |S|F|C|S| |A|D|E|E| --------- Word: "ABCCED" Result: True Implementation notes: Use backtracking approach. At each point, check all neighbors to try to find the next letter of the word. leetcode: https://leetcode.com/problems/word-search/ """ def get_point_key(len_board: int, len_board_column: int, row: int, column: int) -> int: """ Returns the hash key of matrix indexes. >>> get_point_key(10, 20, 1, 0) 200 """ return len_board * len_board_column * row + column def exits_word( board: list[list[str]], word: str, row: int, column: int, word_index: int, visited_points_set: set[int], ) -> bool: """ Return True if it's possible to search the word suffix starting from the word_index. >>> exits_word([["A"]], "B", 0, 0, 0, set()) False """ if board[row][column] != word[word_index]: return False if word_index == len(word) - 1: return True traverts_directions = [(0, 1), (0, -1), (-1, 0), (1, 0)] len_board = len(board) len_board_column = len(board[0]) for direction in traverts_directions: next_i = row + direction[0] next_j = column + direction[1] if not (0 <= next_i < len_board and 0 <= next_j < len_board_column): continue key = get_point_key(len_board, len_board_column, next_i, next_j) if key in visited_points_set: continue visited_points_set.add(key) if exits_word(board, word, next_i, next_j, word_index + 1, visited_points_set): return True visited_points_set.remove(key) return False def word_exists(board: list[list[str]], word: str) -> bool: """ >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCCED") True >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "SEE") True >>> word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCB") False >>> word_exists([["A"]], "A") True >>> word_exists([["A","A","A","A","A","A"], ... ["A","A","A","A","A","A"], ... ["A","A","A","A","A","A"], ... ["A","A","A","A","A","A"], ... ["A","A","A","A","A","B"], ... ["A","A","A","A","B","A"]], ... "AAAAAAAAAAAAABB") False >>> word_exists([["A"]], 123) Traceback (most recent call last): ... ValueError: The word parameter should be a string of length greater than 0. >>> word_exists([["A"]], "") Traceback (most recent call last): ... ValueError: The word parameter should be a string of length greater than 0. >>> word_exists([[]], "AB") Traceback (most recent call last): ... ValueError: The board should be a non empty matrix of single chars strings. >>> word_exists([], "AB") Traceback (most recent call last): ... ValueError: The board should be a non empty matrix of single chars strings. >>> word_exists([["A"], [21]], "AB") Traceback (most recent call last): ... ValueError: The board should be a non empty matrix of single chars strings. """ # Validate board board_error_message = ( "The board should be a non empty matrix of single chars strings." ) len_board = len(board) if not isinstance(board, list) or len(board) == 0: raise ValueError(board_error_message) for row in board: if not isinstance(row, list) or len(row) == 0: raise ValueError(board_error_message) for item in row: if not isinstance(item, str) or len(item) != 1: raise ValueError(board_error_message) # Validate word if not isinstance(word, str) or len(word) == 0: raise ValueError( "The word parameter should be a string of length greater than 0." ) len_board_column = len(board[0]) for i in range(len_board): for j in range(len_board_column): if exits_word( board, word, i, j, 0, {get_point_key(len_board, len_board_column, i, j)} ): return True return False if __name__ == "__main__": import doctest doctest.testmod() import timeit n = 10 ABCCED = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], ' '"ABCCED")', globals=globals(), number=n) SEE = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]],' ' "SEE")', globals=globals(), number=n) ABCB = timeit.timeit('word_exists([["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], ' '"ABCB")', globals=globals(), number=n) A = timeit.timeit('word_exists([["A"]], "A")', globals=globals(), number=n) AAAAAAAAAAAAABB = timeit.timeit('word_exists([["A","A","A","A","A","A"], ["A","A","A","A","A","A"], ' '["A","A","A","A","A","A"], ["A","A","A","A","A","A"], ["A","A","A","A","A","B"]' ', ["A","A","A","A","B","A"]],"AAAAAAAAAAAAABB")', globals=globals(), number=n) print(f"{ABCCED} - ABCCED\n{SEE} - SEE\n{ABCB} - ABCB\n{A} - A\n" f"{AAAAAAAAAAAAABB} - AAAAAAAAAAAAABB") ```

The longest time, as expected, is 'AAAAAAAAAAAAABB', it is a gigantic number of times longer.

Output:

0.00034418300128891133 - ABCCED
0.0004890400014119223 - SEE
0.0004235089982103091 - ABCB
4.959400030202232e-05 - A
29.599618363001355 - AAAAAAAAAAAAABB
quant12345 commented 1 year ago

@cclauss backtracking/word_search.py: can I replace this large(AAAAAAAAAAAAABB) array with a small(ABB) one?

>>> word_exists([["B", "A", "A"], ["A", "A", "A"], ["A", "B", "A"]], "ABB")
False
quant12345 commented 1 year ago

If use the image lena_small.jpg instead of lena.jpg in: test_local_binary_pattern, then the test will be much faster. It is not clear why lena.jpg was used? At least I don't see any reason for this.

Made a pull request #10161.

Made a pull request #10188 word_search - replacing the example in doctest.

Muhammadummerr commented 1 year ago

Check the PR #9978.

tianyizheng02 commented 1 year ago

@cclauss Every algorithm on your list has been sped up (except for simple_neural_network.py, which needs to be rewritten altogether), and every fixable warning has been fixed. Do you want to update this issue's "slowest 10" checklist, or do you want to close this issue?

cclauss commented 1 year ago

@CaioCordeiro Would you be willing to look at neural_network/simple_neural_network.py to see if you can speed up its tests? As discussed above (https://github.com/TheAlgorithms/Python/issues/9718#issuecomment-1781355643), we have sped up our other slow tests but have not been able to speed up this one. I tried without success in #11013