Mimino666 / langdetect

Port of Google's language-detection library to Python.
Other
1.71k stars 196 forks source link

Can i just specify languages that i want to detect, such as only detect en, ja and zh-cn? #71

Open maliho0803 opened 4 years ago

maliho0803 commented 4 years ago

can i just specify languages that i want to detect, such as only detect en, ja and zh-cn?

Zsub commented 4 years ago

You can do this by instantiating the detector yourself:

import csv
import html
import langdetect

with open('rawdata.csv', newline='', encoding="UTF-8") as rawdata:
    rawreader = csv.reader(rawdata, delimiter=',', quotechar='"')

    # instantiate the DetectorFactory
    factory = langdetect.detector_factory.DetectorFactory()
    factory.load_profile(langdetect.detector_factory.PROFILES_DIRECTORY)

    for row in rawreader:
        # this re-creates the detector each time
        detector = factory.create()
        # or whatever your text probabilities are.
        detector.set_prior_map({"en": 0.5, "de": 0.5})
        # give the detector the text to run on
        detector.append(row[column])
        # let the detector run!
        print(detector.detect())
batara666 commented 3 years ago

@Mimino666 can we just ignore specified language ?, and isn't be nice to have that as method ?