Closed fccoelho closed 12 years ago
To install the package:
pip install chromium_compact_language_detector
To use it:
import cld
text = '...' # need to be UTF-8
result = cld.detect(text)
language, language_code = result[0].lower(), result[1]
It compiled just fine on my box. Maybe we can start to use it. We should also use the details offered by the detector such as percent confidence, and the normalized score in the case there is multiple candidates for the language: http://code.google.com/p/chromium-compact-language-detector/source/browse/bindings/python/README
we should also make a note to distribute a precompiled binary so that we don't need to require a full compilation chain on every node.
BTW, In the Dengue literature analysis project we already have multiple languages to handle. It will be a great test case. We need also to adapt the postagging worker to know what languages we have taggers for.
I think pip
always compile everything. If the packager provided a binary/compiled form of the package, it can be installed using easy_install
(I really don't know why pip
doesn't have this feature, as it is intended to be an replacement for easy_install
).
Fixed on 1f6f1fb93c5ddfbbdc2e0ac4d7533386433681f4 (pull request #49).
Implement language detection for PyPLN. suggestion: http://code.google.com/p/chromium-compact-language-detector/