codezonediitj / pydatastructs

A python package for data structures and algorithms
https://pydatastructs.readthedocs.io/en/stable/
Other
200 stars 267 forks source link

Comparing Data Structures #401

Open sjathin opened 3 years ago

sjathin commented 3 years ago

Description of the problem

Feature Request: As there are multiple data structures available to solve a particular problem, this feature allows one to choose the most optimized data structure for a particular problem.

Example of the problem

Array vs Stack
  1. Access: O(1) vs O(N)
  2. Insertion: O(N) vs O(1)

References/Other comments

The example given here is very generic and basic, there can be multiple metrics to compare data structures.

sjathin commented 3 years ago

It would to good to discuss, how to show this comparison? A document? And all the metrics that can be used to compare data structures.

Some initial metrics (CRUD):

  1. Access
  2. Search
  3. Insert/delete
czgdp1807 commented 3 years ago

I was wondering if we can compare the common methods of two data structures or algorithms. Technically, what I mean is that comparing two APIs which perform the same function. Say, for example if we want to compare two trees, AVL Tree and Red Black Tree, then there are two classes in pydatastructs for these and we can check the timings and space requirements of these two trees using the proposed feature. Consider the following example.

Say there is a class, pydatastructs.compare(obj1, obj2) and there are two binary trees under consideration, avl_obj = AVLTree() and rb_obj = RedBlackTree(). Then I would call, pydatastructs.compare(avl_obj, rb_obj). Some common methods of avl_obj and rb_obj are insert, delete, search. The above call may return the comparison data in JSON format with ratio of timings for each method of both the objects. Or in fact, the compare function may just accept class/function names as pydatastructs.compare(AVLTree, RedBlackTree) and create objects inside its definition. Moreover, doing something like, pydatastructs.compare(AVLTree, merge_sort) should give a ValueError because these two APIs are non-comparable for the same data.

Let me know what you think.

sjathin commented 3 years ago

Yes that's very accurate. I do have few more points that we can talk about:

  1. Comparing different data structure types for instance - pydatastructs.compare(AVLTree, LinkedList) - here as well there are common methods that can be compared, for instance given a set of numbers, what's the time/space to create a tree /list.
  2. I really think we can start with a document/wiki page, so that we can document the design, so that it would be easier to talk about what can be compared and how would we go about this.
  3. Although, I still have a question on how would we show this comparison? a. One option when we run the pydatastructs.compare(obj1, obj2) API, show the time and space for each common method?
  4. I think it would be better to add a API like pydatastructs.compare(*objects) so that we can compare multiple data structures

Thoughts on this?

czgdp1807 commented 3 years ago
1. Comparing different data structure types for instance - `pydatastructs.compare(AVLTree, LinkedList)` - here as well there are common methods that can be compared, for instance given a set of numbers, what's the time/space to create a tree /list.

Agreed. In fact that would help in making the APIs consistent.

2\. I really think we can start with a document/wiki page, so that we can document the design, so that it would be easier to talk about what can be compared and how would we go about this.

We can discuss and document the design here in this thread. We can edit the original post with the final design.

3\. One option when we run the `pydatastructs.compare(obj1, obj2)` API, show the time and space for each common method?

Yes. We can show that in JSON format for each data structure. For two arguments the comparison would be more detailed (ratios, which one is how much percent better than the other, etc.) and for multiple ones, we can just sort the data structures with respect to the time and space of each common method. For example if there are 3 common methods between ds1, ds2 and ds3 then, we would receive, 3 x 2 orders for each common method's time and space. Anyways, this is just speculation, once we start implementing it the output format will become more clear.

4\. I think it would be better to add a API like `pydatastructs.compare(*objects)` so that we can compare multiple data structures

Agreed.

Abekaesh commented 1 year ago

Hi, I would like to work on this issue under GSOC 2023.

In order to compare data structures, it is essential to give the features or functions which are not in one of the data structures. I think that we can compare the data structures on the basic operations such as insertion, deletion, traversing to get element at a position and extra operations that can be done by one data structure which other can't with the same dataset with its time and space. For example if we do pydatastructs.compare(obj1, obj2), we need to return a json format output ie, {obj1:{operation1:{time:<>,space:<>}, operation2:{time:<>,space:<>}},obj2:{operation1:{time:<>,space:<>}, operation2:{time:<>,space:<>},operation3:{time:<>,space:<>}}} In the above output format operation3 in obj2 is another operation or function which obj1 doesnot have. Now this can be of more helpful to analyse the data structures. It helps the user to analyse the extra functions of data structures with their time and space for their completion.

Any suggestions or thoughts on this?

czgdp1807 commented 1 year ago

I think comparison should be done for the set of operations which are common to both the data structures. If nothing is common then an empty JSON output along with a warning that no operation is common between the desired data structures.