TheAlgorithms / Rust

All Algorithms implemented in Rust
MIT License
22.93k stars 2.24k forks source link

Add another implementation of Levenshtein distance #702

Closed sozelfist closed 7 months ago

sozelfist commented 7 months ago

Description

Implement Edit Distance algorithm using Dynamic Programming Strategy

Type of change

Please delete options that are not relevant.

Checklist:

codecov-commenter commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.78%. Comparing base (b20faef) to head (131def3). Report is 1 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #702 +/- ## ========================================== - Coverage 94.78% 94.78% -0.01% ========================================== Files 297 297 Lines 22163 22149 -14 ========================================== - Hits 21008 20994 -14 Misses 1155 1155 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sozelfist commented 7 months ago

Please correct me if I am wrong, but I think this is already implemented in levenshtein_distance.rs.

Also, please notice that Levenshtein distance is an example of edit distance.

Yes, Levenshtein distance is an example of edit distance. Levenshtein distance was implemented in this repo but it did not use dynamic programming to solve the problem. Indeed, my implementation use dynamic programming strategy to solve it. Both implementations solve the same problem but use different approaches.

sozelfist commented 7 months ago

What do you think about this PR @vil02?

vil02 commented 7 months ago

The existing levenshtein_distance.rs used a 1D array to store the partial results. Yours is 2D. I think both implementations have $O(nm)$ time complexity, Regarding space complexity it is $O(n)$ vs. $O(nm)$, where $n$, and $m$ are the lengths of the input strings.

I like the way how your code is written. I can see a potential to have both of implementations exist in parallel (as two different functions in one file).

sozelfist commented 7 months ago

I would suggest that we should keep the DP implementation of Levenshtein distance problem in the dynamic_programming directory for clarity of used strategy (or algorithm). It will be helpful for further updates and help us classify algorithms.

sozelfist commented 7 months ago

There's already a file named levenshtein_distance in the string directory, and that your implementation uses a different strategy. How you would like me to proceed?

sozelfist commented 7 months ago

What do you think about placing two files in the string directory and giving them these names: levenshtein_distance_opt_dp and levenshtein_distance_dp for the optimized version and standard version of the dynamic programming strategy?

It may be a reasonable approach! It effectively distinguishes between the two implementations based on their optimization level while still maintaining clarity about the strategy used (dynamic programming).

sozelfist commented 7 months ago

Have a look on my suggestions, or propose a alternative solution @vil02, please.

vil02 commented 7 months ago

Have a look on my suggestions, or propose a alternative solution @vil02, please.

Which suggestions?

sozelfist commented 7 months ago

What do you think about placing two files in the string directory and giving them these names: levenshtein_distance_opt_dp and levenshtein_distance_dp for the optimized version and standard version of the dynamic programming strategy?

It may be a reasonable approach! It effectively distinguishes between the two implementations based on their optimization level while still maintaining clarity about the strategy used (dynamic programming).

This is my suggestion.

vil02 commented 7 months ago

What do you think about placing two files in the string directory and giving them these names: levenshtein_distance_opt_dp and levenshtein_distance_dp for the optimized version and standard version of the dynamic programming strategy?

It may be a reasonable approach! It effectively distinguishes between the two implementations based on their optimization level while still maintaining clarity about the strategy used (dynamic programming).

This is my suggestion.

Go for it!

sozelfist commented 7 months ago

Could you please review the updates, @vil02?

vil02 commented 7 months ago

@TruongNhanNguyen sorry for the confusion. I browsed though the repository and I think we should follow the way used in factorial.rs. So there should be one file src/string/levenshtein_distance.rs with both implementations. Clearly the tests should be parametrized.

sozelfist commented 7 months ago

Let's have a final look, feel free to give suggestions if needed @vil02.