Closed billmdevs closed 1 year ago
I'm not sure I understand your question. The constructor of the Lang
class accepts values of type str
as arguments, and raises an error otherwise:
>>> from iso639 import Lang
>>> lang_name = "English"
>>> Lang(lang_name.encode("utf-8"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lbdx/iso639/iso639/iso639.py", line 68, in __new__
raise InvalidLanguageValue(*args, **kwargs)
iso639.exceptions.InvalidLanguageValue: (*(b'English',), **{}). Only valid ISO 639 language values are supported as arguments.
Is that what you wanted to know?
Thank you for your response @LBeaudoux!
To be clearer; the problem I am trying to solve is;
I want to do is extract text from a pdf written in French(language code: fre or fra) and in Ghomala(language code: bbj) but I couldn't see how to do it using your library as the docs didn't mention something in that aspect. What I am asking is how do I do that with this iso639 library? As I said in my previous message I may have missed something.
Maybe there is a misunderstanding. This library is not for extracting text from PDF files, but only helps you to handle the ISO 639 series of international standards for language codes.
I see! There is a misunderstanding there! When you say it helps to "handle" I am not completely sure what it means. In the case of extracting there are many other libraries I can use but in this library how can I handle the language code.
I think I am not understanding the purpose/how to use your library well.
Basically, this library maps together the different codes and names of the ISO 639 standard. For example, you can get the name of a language from its ISO 639-3 code:
>>> from iso639 import Lang
>>> lg = Lang("eng")
>>> lg
Lang(name='English', pt1='en', pt2b='eng', pt2t='eng', pt3='eng', pt5='')
>>> lg.name
'English'
You can find examples in the README.md file.
Thank you very much for your help!
Hello @LBeaudoux ,
Thank you for the awesome library! I may have missed from your description and documentation but is it possible to use the library to extract text from a pdf in an iso639-3 compatible language and maintain the encoding?