dapphp / securimage

PHP CAPTCHA Script
https://github.com/dapphp/securimage
BSD 2-Clause "Simplified" License
568 stars 190 forks source link

rewrite wordlist random word selection to fix issues with multibyte characters in wordlists #87

Closed Connum closed 4 years ago

Connum commented 5 years ago

I had issues with multibyte characters when using a wordlist, e.g. European umlauts and special chars (äöüßñø). The end value of the substring in the old code would sometimes be smaller than the start value, resulting in a long string containing multiple words. This code fixes these issues and should also be a bit faster.

dapphp commented 4 years ago

Sorry it took forever. I have pushed a fix that should resolve your issues that is still fast and memory efficient.

Yours works fine but I would avoid the use of the file function because that will read the entire file into memory as a PHP array which could get fairly large depending on the size of the wordlist. And on a busy site invoking this over and over could be an unnecessary drain on resources.

Thank you and I hope my change works if you are still using it.

dapphp commented 4 years ago

Also note, depending on the character set of your wordlist, it is very important to specify the wordlist_file_encoding option.

e.g.

<?php

$options = [
 'use_wordlist' => true,
 'wordlist_file' => '/path/to/list.txt',
 'wordlist_file_encoding' => 'WINDOWS-1251', // GB2312, UTF-8, etc...
 // ...other options
];

$img = new Securimage($options);

The options can also go in config.inc.php located in the securimage directory, or located elsewhere and passed in to the constructor using the config_file option.