Trying to extract sentences from non chinese or japanese pdf

ghost commented 4 years ago

Hello I want to extract sentences from a non chinese or japanese pdf , I tried reading the file but I'm having trouble trying to change this: ` # If your language isn't listed, add it here

languages = {
    'ZS': ['EN', '简', 'PIN'],  # Simplified Chinese
    'ZH': ['EN', '繁', 'PIN'],  # Traditional Chinese
    'ZT': ['EN', '繁', 'PIN'],  # Traditional Chinese (Taiwan)
    'YUE': ['EN', '粵', 'YALE'],  # Cantonese | Change YALE to JYUT for Jyutping
    'JA': ['EN', '日|JA', 'ROM']   # Japanese
}

` How can I add a new language here?

ghost commented 4 years ago

Nevermind, solved it. The problem was that my file didn't match the correct name pattern, I had GLOSSIKA-ENIT-EBK.pdf instead of GLOSSIKA-ENIT-F1-EBK.pdf.

emesterhazy commented 4 years ago

@tomas-ampueroc Awesome, glad you figured it out. Hopefully everything else went smoothly.

emesterhazy / glossika-to-anki

Trying to extract sentences from non chinese or japanese pdf #13