crodas / LanguageDetector

PHP Class to detect languages from any free text
320 stars 67 forks source link

It break's with some special characteres #12

Open hugomosh opened 10 years ago

hugomosh commented 10 years ago

Excellent work! Thank you. Just an observations If you feed it with things like: x :) es genial¡¡¡¡¡¡

It throws and error like this:

exception 'RuntimeException' with message 'Invalid or missing outlinks' in C:\Users\personal\GoogleDrive\202_Librerias\LanguageDetector-master\lib\LanguageDetector\Sort\PageRank.php:152
Stack trace:
#0 C:\Users\personal\GoogleDrive\202_Librerias\LanguageDetector-master\lib\LanguageDetector\Detect.php(83): LanguageDetector\Sort\PageRank->sort(Array)
#1 C:\Users\personal\GoogleDrive\202_Librerias\LanguageDetector-master\lib\LanguageDetector\Detect.php(122): LanguageDetector\Detect->detectChunk('!')
#2 C:\Users\personal\GoogleDrive\202_Librerias\LanguageDetector-master\example\detectaIdiomaALista.php(24): LanguageDetector\Detect->detect('!')

Just saying, so it can be more robust :+1:

Maybe it could give "ascii art" as language : P

crodas commented 10 years ago

Good idea. If it fails to parse UTF-8 it should treat the input as an stream of bytes.