-
Govdocs -
[000899.pdf](https://github.com/trailofbits/polyfile/files/4882183/000899.pdf)
[001940.pdf](https://github.com/trailofbits/polyfile/files/4882184/001940.pdf)
```
Parsing PDF obj 62 …
-
var pdfReader = hummus.createReader(sourcePath);
pageNumber=pdfReader.getPagesCount()
-
```
What steps will reproduce the problem?
1. in applet use method appendPDF and specify not existing URL
2. jzebraDoneAppending() will fire, but there is no exception in
applet.getException() (you …
-
Tried parsing a pdf policy doc:
https://cloud.llamaindex.ai/project/e23dcff4-03d1-441f-a709-d1222b98f3f2/extraction/f7959f6b-1a7f-4fa1-9de3-87af4ccde45f
https://uvapolicy.virginia.edu/print/pdf/node…
-
### Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
### Branch name
main
### Commit ID
bef1bbdf3e16e5163bc563407bd7fd8f7da97d7a
### Other environment infor…
-
Are there any alternatives to GROBID and would there be any major advantages in using them?
### Alternatives (feel free to add new entries)
- https://github.com/pdfminer/pdfminer.six
- https://gi…
-
The pdf parsing of https://homepages.cwi.nl/~lex/files/dict.pdf doesn't look very appealing.
Thinks i already noticed
- No TOC display (and strange header size detection, see #21)
- Characters a…
-
I've had a blast (re)creating my CV with react-pdf and wanted to create a template for applications (written in markdown) as well. My problem is that it seems that most of the markdown libraries out t…
-
Use pdf.js to manually parse an arbitrary pdf currently open
-
The code can parse text and images from a PDF file but there are few issues.
1. The gem used to extract images contained in PDF file is `pdf-reader`. It extracts the images in `TIFF` format which is […