-
the text `5 April 2009.` at the end of a sentence where `2009.` is on the next line breaks list-detection marking the line as item number 10 in the list
2009 is detected as the next list item, and …
jcsrb updated
3 years ago
-
**Is your feature request related to a problem? Please describe.**
Support German page numbers. Right now, only Spanish and English are supported.
**Describe the solution you'd like**
Add German …
-
**Summary**
When using pdfminer with mupdf to extract image, the source of the image is never found.
**Expected behavior**
I expect to retrieve the source path of images (it works when using p…
-
[version] antlr 4.9.2
[file size] 113 MB
[cpp-runtime spend] 156 s
[java-runtime spend] 39s
I am currently using antlr 4.9.2 to develop a parser,The runtime language I choose is cpp。
At tha…
-
This link https://mp.weixin.qq.com/s?src=11×tamp=1592836784&ver=2416&signature=twPx*M3463d4-VSqClYYO5XF5gbF6u4xITOzMSde2WxwahX1vHbayJBPRbCdSpXhrhSvuZdZpjNQfDNedeedkK0dLPbSsh8tVxLQDh7jjGMyrhyqqom8…
-
Episode 708 mentions several entities, concepts, etc that could be tagged for use by visualization.
There are people, events, organizations, devices, places, dates, experiences, topics, and more th…
-
With Annif, it is possible to use several specialised models for prediction in an ensemble. However, all models in an Annif ensemble, can only be given one specific single kind of text for prediction,…
-
Find a way to have table detection with Tesseract.
Maybe Tesseract has some options to do it.
Maybe we can find a way to pass the bounding boxes and content to Camelot.
Related links
- https://g…
-
#164 This issue still remains unresolved on Win10 - python 3.7.1
Has someone found a solution yet?
-
**Summary**
Parser is a great tool.
I use it to parse a pdf, which is about 45M Bytes, it used more than 15 minute to parse tables
```
[2021-01-19T13:06:04] INFO (parsr-api/7 on 0452d42e4a31)…