codezonediitj / pydatastructs

A python package for data structures and algorithms
https://pydatastructs.readthedocs.io/en/stable/
Other
202 stars 270 forks source link

Add m-ary trees #101

Open czgdp1807 opened 4 years ago

czgdp1807 commented 4 years ago

Description of the problem

Adding m-ary trees will be a nice idea. May be a array based(allows printing the trees) or pointer(better space utilisation) or both would be better.

Example of the problem

References/Other comments

.. [1] https://en.wikipedia.org/wiki/M-ary_tree

vibhu18116 commented 4 years ago

Hi, can I start working on this issue as a part of RGSoC20?

czgdp1807 commented 4 years ago

Hi, @vibhu18116 Please start by designing an API for m-ary trees. Please let us know if you face any problem/queries.

vibhu18116 commented 4 years ago

Thank you for allowing me to work on this.

I believe the API design would closely follow that of binary trees. Hence these are a few APIs I think should form a basis:

This would initialise a new m-ary tree. m_value will be used to determine the maximum value of m (the number of children of a node), default to 2 (binary tree in that case). array_flag can be set to indicate that an array implementation is required and it can remain 0 for pointer implementation.

Rest of the parameters will be same as binary tree implementation.

The above four will behave in the same way as in the binary tree. Implementation of the __str__ function will vary depending upon if the tree is implemented through array or pointers.

Kindly check if these APIs work.

czgdp1807 commented 4 years ago

__new__(cls, m_value = 2, array_flag = 0, key=None, root_data=None, comp=None)

IMO, using only array based implementation will be better here as pointers are somewhat unsafe, so avoid them where ever possible. Instead of m_value, a better name would be max_children. This suggests a nice way to implement using arrays.

Can you please explain the use of the values returned by height and width methods?

convert_to_binary(self, array_flag = 0)

IMO, just to_binary_tree(self) would better.

You may refer, https://github.com/codezonediitj/pydatastructs/blob/master/pydatastructs/trees/binary_trees.py while implementing the above APIs.

czgdp1807 commented 4 years ago

Note that m-ary tree is not a search tree, so we don't need to worry about the manner in which nodes are stored. We need a creative way to avoid the exponential space complexity of array based implementation. May be a new type of node inheriting TreeNode can be made which will be having an array storing the indices of it's children node in the array of m-ary trees. Please let me know if you are unable to get my point.

vibhu18116 commented 4 years ago

I thought height and width of a tree can provide additional functionality to the user, however other than from feature point of view, I don't find any suitable use case. So, if you suggest I would drop them.

I would update the names of functions as suggested.

vibhu18116 commented 4 years ago

You may refer, https://github.com/codezonediitj/pydatastructs/blob/master/pydatastructs/trees/binary_trees.py while implementing the above APIs.

I checked this implementation, but a lot of functions like insert, delete are not implemented, probably because they depend on comparision key. Since the insert, delete API would follow the same pattern, should I first implement them?

czgdp1807 commented 4 years ago

Actually, binary trees are quite abstract, hence their insert and delete operations are left unimplemented. What logic should be followed while making insertions to a binary tree? Both the operations are well defined for binary search trees. IMO, we can keep insert, delete operations undefined for m-ary trees. We can a complementary m-ary search tree inheriting m-ary tree with insert and delete well defined for them. Here is defined for these kind of trees. You can do the work in two separate PRs for m-ary and m-ary search trees, one after the previous one.

vibhu18116 commented 4 years ago

I get your point. I will try to come up with actual implementation and try to create a PR in a day or two for m-ary tree.

vibhu18116 commented 4 years ago

Can I keep to_binary_tree method abstract in m-ary tree, or a concrete implementation is required?

If a concrete implementation is required, then should I modify the m-ary tree itself or create a new binary tree with nodes of type TreeNode?

Currently, I have made a new node type extending TreeNode as suggested earlier which form nodes of m-ary tree.

czgdp1807 commented 4 years ago

Can I keep to_binary_tree method abstract in m-ary tree, or a concrete implementation is required?

We can keep it as abstraction in m-ary tree but a concrete implementation can be added in m-ary search tree.

If a concrete implementation is required, then should I modify the m-ary tree itself or create a new binary tree with nodes of type TreeNode?

IMO, a new tree should returned.

vibhu18116 commented 4 years ago

I created a PR. However, the Travis checks failed due to some trailing whitespaces. Can you please guide how do I ensure there are no trailing white spaces?

Also can you review the code once to see if it is the desired implementation?

czgdp1807 commented 4 years ago

It's time to make a PR for MArySearchTree and the relevant tests.

vibhu18116 commented 4 years ago

Okay. I will start work on it soon :)

czgdp1807 commented 4 years ago

And I hope you are working on your RGSoC application. The deadline is 30th March according to the latest information I have. See, https://railsgirlssummerofcode.org/students/application/#apply

vibhu18116 commented 4 years ago

Yes. I will start working on application as well.

vibhu18116 commented 4 years ago

It's time to make a PR for MArySearchTree and the relevant tests.

I was looking through MArySearchTree on the link you referenced earlier.

However, I don't see a determinate way of insertion.

Say, if the nodes can have 4 children, and root has currently 3 only as a data item, then on trying to insert 5, it can become a part of the root, or go in a child node to the right of 3.

Can you clarify this point once?

czgdp1807 commented 4 years ago

I see, the current implementation in m_ary_trees.py doesn't align with the description given here. It should contain a group of keys but it just contains a single key. A minor update is needed on it.

I was looking through MArySearchTree on the link you referenced earlier.

However, I don't see a determinate way of insertion.

Some research is needed.

vibhu18116 commented 4 years ago

I see, the current implementation in m_ary_trees.py doesn't align with the description given here. It should contain a group of keys but it just contains a single key. A minor update is needed on it.

I think adding a method to insert a value in the node in MAryTreeNode should work. The value will be added if currently the number of values in the node are less than (max_children - 1), else, error can be thrown.

Does this work?

czgdp1807 commented 4 years ago

I think adding a method to insert a value in the node in MAryTreeNode should work.

How it will help in having multiple keys in the node?

vibhu18116 commented 4 years ago

So, the node can store the (data, key) as a tuple in a dynamically sized array which can grow up to (m-1) size where m is the maximum number of children a node can have.

The information about the children nodes is already stored in the children array.

czgdp1807 commented 4 years ago

Here's what I have thought of,

MAryTreeNode

  1. key_list - A SinglyLinkedList with key, data in LinkedListNode. The framework for linked lists is in linear_data_structures. We can keep array as well but for consistency, tuple might not be a good idea instead, I'd suggest to factor out the key and data part to the abstract node class as almost every node has these two members and then just use the abstract Node class as common type for array of keys.
  2. children - An array of child indices with i-th one pointing to a node whose keys are between key_list[i] and key_list[i-1].

The children array is already there so only 1 needs to be added.

vibhu18116 commented 4 years ago

This works. But the array could have been subjected to binary search to search a particular element in that node. However, this won't be possible in the linked list implementation. So, what is your opinion about this?

czgdp1807 commented 4 years ago

But the array could have been subjected to binary search to search a particular element in that node.

Well, the maximum size of the key_list won't be very large so, O(log(m)) or O(m) will not make much of a difference. Over optimisation can be avoided here. If m is large then MAryTree will not be of any practical use.

czgdp1807 commented 4 years ago

If m is large then MAryTree will not be of any practical use.

Though if you are aware of any use case where m is very large then we can think of using arrays.

vibhu18116 commented 4 years ago

I didn't find any specifics about the node size in different applications. So, if you say, I can proceed with a linked list.

czgdp1807 commented 4 years ago

Sure, feel free to proceed.

Arvind-raj06 commented 3 years ago

@czgdp1807 I hope this tree has already been implemented why not close this one!