-
## 检查
- [x] 已在 issues 中进行搜索(包括已关闭的问题)
## 编译环境
- 操作系统
- [x] Windows 7/8/10
- TeX 发行版
- [x] TeX Live 2019 pretest
## 描述问题
ctex + lwarp 生成的 html 出现文字顺序颠倒(见最小例子第 11 个脚注),改成 art…
-
I have heard from different sources this error is associated with windows 10 .
`textract.exceptions.ShellError: The command `pdftotext ../data/input/example_resumes\Brendan_Herger_Resume.pdf -` fail…
-
### What to do
When converting PDF documents to txt with either apache tika or pdf2text we have some functionality to split the documents by passages afterwards. It would be beneficial to have per pa…
-
the first part of the conversion notebook converts pdf to text
is there a possibility to provide the students with the pdf corpus also ?
-
_**Please provide all mandatory information!**_
## Describe the bug (mandatory)
I have a flock of PDFs that are have in the following attributes:
Producer: GPL Ghostscript 9.15
PDF Version: 1.4
…
-
- It would be nice to have bindings for pdftohtml, some outdated bindings for Python can be found at https://github.com/mgedmin/pdf2html . Packages such as Nokogiri would enable elegant processing
- …
-
Related: #659
Just a proposal to make the preview pane for PDFs more useful.
I know that you can't show the real PDF because there is no mapping between the text you search through and the posit…
-
Tika works fine for most PDFs – however I have some files, that Tika simply returns gibberish for in the content.
Not sure as to why it is, since the `parser` interface doesn't seem to allow for m…
-
I cannot get any decoded text out of either of these:
[148154.pdf](https://github.com/cpierce/pdf2text/files/7877397/148154.pdf)
[temp.pdf](https://github.com/cpierce/pdf2text/files/7877398/temp.pdf…
-
The cli developed in #53 will output json documents containing the text for each document.
Add the ability to create a Document object from this json. This will be useful when using the corpus of ext…