c9s / CLIFramework

A powerful command line application framework for PHP. It's an extensible, flexible component, You can build your command-based application in seconds!
Other
437 stars 51 forks source link

Corrector: suggests using similar_text instead of levenshtein #50

Closed dh3014 closed 9 years ago

dh3014 commented 9 years ago

I think current Corrector::guess() is disappointing. I add the first command 'scheduled-job' in my app. Since I'm still using bash (no completion), I'm trying with Corrector feature. Here's what happened:

mb sch
Did you mean 'zsh'? [Y/n] 
[ec2-user ~]$ mb sche
Did you mean 'zsh'? [Y/n] n
[ec2-user ~]$ mb sched
Did you mean 'zsh'? [Y/n]
[ec2-user ~]$ mb schedu
Did you mean 'help'? [Y/n]
[ec2-user ~]$ mb schedule
Did you mean 'help'? [Y/n]
[ec2-user ~]$ mb scheduled
Did you mean 'scheduled-job'? [Y/n]

After looking into source in Corrector, it uses levenshtein() to make the best guess. I think this algo computes edit distance between two strings. However I don't think edit-distance is fit into suggesting feature, in fact I guess even LCS or LIS would be better in suggestion.

To make it simple, I think another php built-in function similar_text[http://php.net/manual/en/function.similar-text.php] is much better in this case. It has the same time complexity with levenshtein (but slower a bit, yes).

I did a little experiment:

[ec2-user ~]$ mb sche
Did you mean 'zsh'? [Y/n] 
[ec2-user ~]$ mb sched
Did you mean 'scheduled-job'? [Y/n] 

What would you think about changing the suggestion algorithm?

c9s commented 9 years ago

Sounds good! Can we separate the algorithms by an option and use yours by default ? Thanks!

dh3014 commented 9 years ago

To make the algorithm plug-able, we need to decide the semantics of the interface.

I think it's not worth to encapsulate a Strategy Class here, a simple callback would be fine. But what does the callback means semantically?

For example, levenshtein() outputs DISTANCE of strings, so we can define exact match by DISTANCE=0, and require DISTANCE as small as possible. similar_text() on the other hand outputs SIMILARITY of strings, so greater is better, it can also outputs RELATIVE SIMILARITY, so an exact match means RELATIVE SIMILARITY = 100%

We need to choose one of it, so any person wants to use their own algorithm has to follow up.

c9s commented 9 years ago

OK, I think we can just rename the current Corrector to LevenshteinCorrector and let the similar_text corrector replace the current one.

c9s commented 9 years ago

I will work on the DI container for injecting services in the future, then we can let users to register their own corrector.

dh3014 commented 9 years ago

DI container is great, but there are thousands of DI implementations on [packist.org]. Many of them works in different way (by different purpose, too), would be hard to evaluate :p.

I'll send PR perhaps few days later.

c9s commented 9 years ago

Surveying projects are pretty time consuming, I would prefer to start from a small one. :-p

c9s commented 9 years ago

Cool, that would be great, Thanks for your work!

c9s commented 9 years ago

I am closing this since you've done this task. :)