codezonediitj / pydatastructs

A python package for data structures and algorithms
https://pydatastructs.readthedocs.io/en/stable/
Other
202 stars 270 forks source link

Adding suffix tree #323

Open Arvind-raj06 opened 3 years ago

Arvind-raj06 commented 3 years ago

Suffix Tree

References to other Issues or PRs or Relevant literature

"Fixes #290". See https://github.com/codezonediitj/pydatastructs/issues/290

Brief description of what is fixed or changed

Addition of suffix tree with reference to the pypi project.

Other comments

The code works fine to me, the PR is in count of SWOC and if you find any bug pls ping me.

codecov[bot] commented 3 years ago

Codecov Report

Merging #323 (e3586fa) into master (bcf9753) will decrease coverage by 0.181%. The diff coverage is 95.312%.

@@              Coverage Diff              @@
##            master      #323       +/-   ##
=============================================
- Coverage   98.550%   98.369%   -0.181%     
=============================================
  Files           25        26        +1     
  Lines         3243      3435      +192     
=============================================
+ Hits          3196      3379      +183     
- Misses          47        56        +9     
Impacted Files Coverage Δ
pydatastructs/utils/__init__.py 100.000% <ø> (ø)
pydatastructs/strings/suffix_tree.py 94.230% <94.230%> (ø)
pydatastructs/strings/__init__.py 100.000% <100.000%> (ø)
pydatastructs/utils/misc_util.py 99.019% <100.000%> (+0.189%) :arrow_up:

Impacted file tree graph

czgdp1807 commented 3 years ago

There are several uncovered lines in https://codecov.io/gh/codezonediitj/pydatastructs/pull/323/diff (see red). Use steps in https://github.com/codezonediitj/pydatastructs#testing for checking the coverage for your patch locally and add tests accordingly to fully cover your code by tests.

Arvind-raj06 commented 3 years ago

Thank you for informing me, I haven't noticed these a while

Arvind-raj06 commented 3 years ago

I have added some test to cover that area, I can't get it to max but given my level best

Arvind-raj06 commented 3 years ago

@czgdp1807 any more corrections needed!?

czgdp1807 commented 3 years ago

Please add documentation for public methods (the ones which don't start with _).

Arvind-raj06 commented 3 years ago

Please add documentation for public methods (the ones which don't start with _).

No issue found that

Arvind-raj06 commented 3 years ago

@czgdp1807 They are requesting the leaderboard update at SWoC, today is the last date

czgdp1807 commented 3 years ago

Done.

czgdp1807 commented 3 years ago

I am a bit busy these days. There might be a delay in reviews.

Arvind-raj06 commented 3 years ago

I am a bit busy these days. There might be a delay in reviews.

Yeah sure take your time

Arvind-raj06 commented 3 years ago

Done.

Thank you 😉

czgdp1807 commented 3 years ago

You can continue with your other PRs. I will finalize all of these on Saturday, next week.

Arvind-raj06 commented 3 years ago

You can continue with your other PRs. I will finalize all of these on Saturday, next week.

Yeah 👍

czgdp1807 commented 3 years ago

There are too many private methods (starting with _). Please add a one line description in each of these to explain the purpose more clearly.

Arvind-raj06 commented 3 years ago

There are too many private methods (starting with _). Please add a one line description in each of these to explain the purpose more clearly.

Yes @czgdp1807

czgdp1807 commented 3 years ago

Just let me know when is the last date for participants to submit PRs and obtain socres.

Arvind-raj06 commented 3 years ago

Last date for PR submission is 28 Feb and I hope it's the last date for scores too. Winners and results announcement be 15th march

Arvind-raj06 commented 3 years ago

10th march is the last date for leaderboard submission by PA

czgdp1807 commented 3 years ago

I will make some changes to it in due time and then will merge it.

Arvind-raj06 commented 3 years ago

I will make some changes to it in due time and then will merge it.

Yeah

czgdp1807 commented 3 years ago

I hope that you might have gone through the changes I made to your SkipList PR. Would request to make this better by learning from there. The code you write works correctly and efficiently but a bit of consistency (both in APIs and docs) is needed with respect to the existing things in master. For example, in your code of SkipList.insert(num) was the API but for every other insert method in master, a key value and an optional data value is taken as input.

Arvind-raj06 commented 3 years ago

I hope that you might have gone through the changes I made to your SkipList PR. Would request to make this better by learning from there. The code you write works correctly and efficiently but a bit of consistency (both in APIs and docs) is needed with respect to the existing things in master. For example, in your code of SkipList.insert(num) was the API but for every other insert method in master, a key value and an optional data value is taken as input.

Yeah It was like literally wow! I have studied it and got some good idea on consistency, Thank you Yes later having seen the test change it's clear, I think it will take practice for me to obtain that consistency!

Arvind-raj06 commented 3 years ago

Shall I resolve the conflicts out here

czgdp1807 commented 3 years ago

Shall I resolve the conflicts out here

Preferrable resolve the conflicts locally. When you will update your master and try to merge it into Allgood branch, git will raise conflicts. If you are using VSCode then it will highlight the files of conflict and the changes you want to accept. After doing that just make a commit and conflicts will be resolved.

Arvind-raj06 commented 3 years ago

Shall I resolve the conflicts out here

Preferrable resolve the conflicts locally. When you will update your master and try to merge it into Allgood branch, git will raise conflicts. If you are using VSCode then it will highlight the files of conflict and the changes you want to accept. After doing that just make a commit and conflicts will be resolved.

No worries I'm using atom which will also do the same. I had the experience of same in my previous PR this time will try to implement it perfectly.

czgdp1807 commented 3 years ago

Have you implemented this algorithm?

czgdp1807 commented 3 years ago

Note this PR will not be counted towards SWoC but it will be counted for GSSoC. This needs some more work both at the design level and at the implementation level.

Arvind-raj06 commented 3 years ago

Yeah sure then

Arvind-raj06 commented 3 years ago

I'm back did you find anything to be added to make code better! PS you can change the label to GSSoC

Arvind-raj06 commented 3 years ago

Cool happy to work with you back

czgdp1807 commented 3 years ago

Well, IMHO, the code has a lot of room for improvement. It should be as simple as possible and easy to understand. I would suggest instead of jumping to code let's pick some examples and solve it via fake APIs and keep refining those APIs until a consensus is reached. Then we can proceed with coding and all. This is an important data structure and should be added after a lot of discussions. We are in no hurry.

Fault is mine too, I shouldn't have approved the previous designs and stuff for this DS. Let's pick an example to programmatically duplicate some example here.

Arvind-raj06 commented 3 years ago

Well, IMHO, the code has a lot of room for improvement. It should be as simple as possible and easy to understand. I would suggest instead of jumping to code let's pick some examples and solve it via fake APIs and keep refining those APIs until a consensus is reached. Then we can proceed with coding and all. This is an important data structure and should be added after a lot of discussions. We are in no hurry.

Yeah as you say

Arvind-raj06 commented 3 years ago

Hey @czgdp1807 I think this is the best suffix tree structure I could obtain from the examples. But as for the property side it's still lacking some potential ! Should I work on that rather I make a new PR for Suffix tree properties!?

czgdp1807 commented 3 years ago

We have two options with us, one to keep spending time on Suffix Tree which is a hard data structure. This will delay our first release. Another option is to put this on hold and add it to the next version. I think we should go for the second one since once people will start using our project, contributions will automatically scale. So, let us ignore it for now. Let's keep this PR open and add it to 0.0.2 milestone.

P.S. We should aim to complete the first release by first half of the year that is by August 2021. Let me know what you think.

czgdp1807 commented 3 years ago

I have updated the list of features to be added to 0.0.1. We should focus on graph algorithms and KMP in strings. The framework for former is already set (just algorithms are required to be implemented) and KMP shouldn't be that hard to implement.

Arvind-raj06 commented 3 years ago

That's really a nice idea! I will work on other issues as this is time consuming I felt the same way.

Arvind-raj06 commented 3 years ago

I have updated the list of features to be added to 0.0.1. We should focus on graph algorithms and KMP in strings. The framework for former is already set (just algorithms are required to be implemented) and KMP shouldn't be that hard to implement.

Cool gonna learn something new on graphs

czgdp1807 commented 3 years ago

Here is the updated list, https://github.com/codezonediitj/pydatastructs/wiki/Planned-Features-for-v0.0.1

Not much is to be done for the first release. We will postpone the harder parts for the second half of the year. Otherwise we will never be able to release the package. Just three graph algorithms and one KMP is to be implemented. All of these are easy medium and can be done in one month's time. After that we will be able to focus on building docs and upload it on readthedocs.io