Open vmdhhh opened 3 years ago
For what it's worth, I hacked around this as follows:
from langdetect import DetectorFactory, PROFILES_DIRECTORY
factory = DetectorFactory()
factory.load_profile(PROFILES_DIRECTORY)
detector = factory.create()
def detect(text, detector=detector):
detector.text = ""
detector.append(text)
return detector.detect()
Obviously not a proper solution but might be useful as a temporary speed-up. Hopefully this can be fixed within langdetect itself.
The detect
function in https://github.com/Mimino666/langdetect/issues/77#issuecomment-880545747 needs to be updated to something like:
def detect(text, detector=detector):
detector.text = ""
detector.langprob = None
detector.append(text)
return detector.detect()
because in the get_probabilities method, the previously-generated self.langprob
is re-used if it's not None
. This means that, if running the detect
function on a list of strings from various languages, it will always return the language detected from the first string.
Should we take the init_factory() outside the detect() so that if we are using this function on dataframes or in loops, it won't have to load the 55 language files over and over again? What do you think? @Mimino666