Open piegu opened 9 months ago
In the file lang.py, I see the use of the library langdetect.
langdetect
In the same file, there is a function detect_languages() but it looks that partition_pdf and partition_via_api do not use it in the case of a PDF.
partition_pdf
partition_via_api
If it is true, why partition_pdf and partition_via_api do not use it to detect automatically the languages of the PDF?
Because of that, we have to write manually in the parameter languages the list of languages of the PDF.
languages
Did I miss something?
+1, good question, I also met the problem when to process pdf(two language text inside), it can process english, but not Chinese word.
In the file lang.py, I see the use of the library
langdetect
.In the same file, there is a function detect_languages() but it looks that
partition_pdf
andpartition_via_api
do not use it in the case of a PDF.If it is true, why
partition_pdf
andpartition_via_api
do not use it to detect automatically the languages of the PDF?Because of that, we have to write manually in the parameter
languages
the list of languages of the PDF.Did I miss something?