-
After parsing the .pdf file and converting it to text, the original formatting breaks.
Docwire 04.04.2024 version, MSVC compiler.
For example, the file from the tests was taken [3.pdf](https://githu…
-
### What happened?
Parser incorrectly parses Lv2からチートだった元勇者候補のまったり異世界ライフ 1巻 as only L - Volume 1. This is a PDF so it needs to use the filename parsing.
### What did you expect?
Full series title i…
-
Research and implement the following tasks related to tracking, retrieving, and parsing weekly work hours from Desktime PDFs:
**1.** Develop an endpoint to directly read weekly work hours from Desk…
-
I'm trying to parse a PDF using the example, but parsing a small 209 kb file requires more than 5 seconds.
```
using namespace docwire;
std::stringstream out_stream;
std::filesystem::path("D:\\pdf…
-
Hi all, thanks for the project. Pdfact is a great step forward in extracting text from PDFs. Are you planning to accept contributions, like exposing it over a web-based API?
-
pdf_helper is always throwing an error, while working with a text / image based pdf as import.
## pdf_helper
## Describe the bug
The server log is polluted with ERROR lines which actual…
-
Hi, it would be useful if some error handling was added in case a PDF fails to parse. I earlier got this error after parsing 1000s of PDFs and had to restart from scratch (not a big deal of course I u…
-
Should probably do with GROBiD, though that will be slow?
Code here: https://github.com/LukasWallrich/diversity_meta/blob/2536b12166a728140a9f4c7f1013b11bde446e4e/SM1%20-%20Search%20and%20Screening…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
I am running the development server using `docker` on my local machine.
The API url I'm using is:
```
http://localhost:5010/api/parseDocument?renderFormat=all&applyOcr=yes&useNewIndentParser=ye…