codezonediitj / pydatastructs

A python package for data structures and algorithms
https://pydatastructs.readthedocs.io/en/stable/
Other
201 stars 269 forks source link

Add searching algorithms for linear data structures #446

Open EmperorArthurIX opened 2 years ago

EmperorArthurIX commented 2 years ago

Description of the problem

The codebase is very extensive and well-designed; however, it does not seem to include searching algorithms for linear data structures as far as I could observe. I believe addition of searching algorithms would be a good improvement to the project. Certain algorithms I have thought can be included are:

  • Linear Search(array, number)

  • Binary Search(array, number, unsorted_input=False)

  • Jump Search(array, number, unsorted_input=False)

The above mentioned declarations are suggested, where the parameter unsorted_input can be used to explicitly learn from the user whether or not the input array is sorted beforehand. If True, we must first sort the array within the search function, then search.

Comments

With certain modifications, we may even allow to return the sorted array along with search result, but I have no idea on how that will be done at the moment.

PS: I am currently working on the Radix Sort issue that is already open, and after that reaches a satisfactory stage, I wish to go on with this one. Discussions and suggestions are welcome.

HarsheetKakar commented 2 years ago

With binary search and jump search in mind we can also have a SortedOneDimentionalArray as data structure which would take OneDimentionalArray as input and would dynamically choose the sorting algorithm based on how far the array is already sorted which would come from its Hamming Distance (wheather it would be insertionSort which works best for nearly sorted arrays and quickSort if the array is not sorted).

czgdp1807 commented 2 years ago

Interesting idea. I think the idea of SortedOneDimensionalArray should be discusses in a separate issue. Is there any prior work (like Wikipedia article, papers, or lecture notes available online) which can provide clarity and examples?

HarsheetKakar commented 2 years ago

I have moved the conversation to Gitter once we reach a consensus I will create a separate issue.

czgdp1807 commented 2 years ago

Gitter

It's inactive. Folks use Discord these days a lot. Github issue would be the best place to discuss as it is accessible by anyone in the open source community.

czgdp1807 commented 2 years ago

And may be we should provide a function that should take an array as input and suggest one of the pydatastructs APIs as output. The time complexity should be O(n) because anything more won't be worth it. The constant in O(n) should also be very small (we can benchmark for that).

czgdp1807 commented 2 years ago

The above mentioned declarations are suggested, where the parameter unsorted_input can be used to explicitly learn from the user whether or not the input array is sorted beforehand. If True, we must first sort the array within the search function, then search.

I think that if a searching algorithm require input array to be sorted then it should assume that. Otherwise we won't be able to guarantee lower time complexities of algorithms like binary search, exponential search. The overhead of sorting an array is non-trivial, in-fact even checking whether an array is sorted or not will take o(n) time which is large as compared to O(logn) promised by binary search. So, let's keep things simple and encapsulated for now. I would suggest something like this, search(input_array, algorithm) where algorithm would be a string (non-optional argument), binary_search, linear_search, jump_search, exponential_search, etc.

EmperorArthurIX commented 2 years ago

The above mentioned declarations are suggested, where the parameter unsorted_input can be used to explicitly learn from the user whether or not the input array is sorted beforehand. If True, we must first sort the array within the search function, then search.

I think that if a searching algorithm require input array to be sorted then it should assume that. Otherwise we won't be able to guarantee lower time complexities of algorithms like binary search, exponential search. The overhead of sorting an array is non-trivial, in-fact even checking whether an array is sorted or not will take o(n) time which is large as compared to O(logn) promised by binary search. So, let's keep things simple and encapsulated for now. I would suggest something like this, search(input_array, algorithm) where algorithm would be a string (non-optional argument), binary_search, linear_search, jump_search, exponential_search, etc.

Noted, will think of solutions from that perspective while planning and coding.

czgdp1807 commented 2 years ago

The input should not necessarily be an array. A function should work too. An array is conceptually also a function (of indices). See this https://en.wikipedia.org/wiki/Ternary_search for example. Allowing inputs as function would make API more practical.

r-avenous commented 2 years ago

Shall I work on this one?

czgdp1807 commented 2 years ago

Sure go ahead.

czgdp1807 commented 2 years ago

@skullVoid has already opened a PR. Seems like he/she haven't linked it here yet.

varsha080 commented 2 years ago

Hey! I can solve the issue mentioned above. Please assign the work to me. I'll write the code of all searching algo with their complexity defined.

czgdp1807 commented 2 years ago

@varsha080 Sure go ahead. We don't assign issues. Please read, https://github.com/codezonediitj/pydatastructs/wiki/Issue-Policy

Abekaesh commented 1 year ago

Hello! I would like to work on this issue under GSoC 2023. I can write the code of the searching algorithms in an efficient manner. Shall I work on this?

Abekaesh commented 1 year ago

Is this issue still open? Can I work on this?

czgdp1807 commented 1 year ago

Yes. Please go ahead.

Abekaesh commented 1 year ago

Just to clarify what the problem statement, We need to implement linear search algorithm, binary search algorithm, jump search algorithm and create a common search function where its arguments are array, key, unsorted_input, algorithm where unsorted_input is a parameter necessary for binary search and jump search, and optional for linear search. If unsorted_input is true we need to return sorted array with index of key in the sorted array. In linear search unsorted_input can be given to do the same. If key not found in array, it returns -1 in position and a sorted array if unsorted_input=True. Am I correct?

czgdp1807 commented 1 year ago

Please note https://github.com/codezonediitj/pydatastructs/issues/446#issuecomment-986186631 and go ahead with your work. Feel free to take decisions on the basis of your judgement. Thanks.

mridul45 commented 1 year ago

I would be happy to contribute to the project as I find my skills a good fit for the issue . As a GSSOC 23 member I humbly ask to allow me to contribute to the issue.