crodas / LanguageDetector

PHP Class to detect languages from any free text
320 stars 67 forks source link

swedish sample is completely wrong #8

Closed bazo closed 10 years ago

bazo commented 11 years ago

i don't know what language it is, but not swedish, i'm pretty sure.

redaktor commented 11 years ago

It is danish - but however - it is very easy: Why don't you let the datafile learn swedish (and more languages). Have a look at learn.php in the examples...

bazo commented 11 years ago

well i can speak both swedish and danish and as far as i know "Katika hatua ya kihistoria" is not danish nor swedish. i know i can learn the lib learn any language, but my point is, if you provide samples the provide correct ones.

juriansluiman commented 11 years ago

If you search for the phrase, Google points out it's Swahili. Rename swedish to swahili with a PR and the case is solved :)

redaktor commented 11 years ago

:+1: for the Swahili fix. Just send over an additional commit to the author via mail (swedish and some more languages [training files + datafile] )

crodas commented 11 years ago

@redaktor nice to have you helping the project. For some odd reason I'm not able to receive github emails. @juriansluiman Just add some sample text and save it in example/samples/$language_name and make a pull request, or email me some sample text to crodas@php.net. I only speak Spanish, English and some broken Portuguese.

Any help is more than welcome :-)

redaktor commented 11 years ago

ok - so I'll do a squash commmit next week. How do you think about 'training sources' : Currently I use different texts from worlds wikipedias plus a famous poem plus an old text (if it is a long existing language).