digininja / CeWL

CeWL is a Custom Word List Generator
1.96k stars 258 forks source link

Words with accented characters are ignored #42

Closed trou closed 6 years ago

trou commented 6 years ago

Hello, thanks for the tool ! I've made a small patch to match correclty accented chars:

diff --git a/cewl.rb b/cewl.rb
index 967b5ed..22ef574 100755
--- a/cewl.rb
+++ b/cewl.rb
@@ -939,9 +939,9 @@ catch :ctrl_c do
                                                if wordlist
                                                        # Remove any symbols
                                                        if words_with_numbers then
-                                                               words.gsub!(/[^a-z0-9]/i, " ")
+                                                               words.gsub!(/[^[[:alnum:]]]/i, " ")
                                                        else
-                                                               words.gsub!(/[^a-z]/i, " ")
+                                                               words.gsub!(/[^[[:alpha:]]]/i, " ")
                                                        end

                                                        # Add to the array

Which is needed for languages with non-ASCII chars :)

digininja commented 6 years ago

can you send it as a pull request?