Open defencedog opened 2 years ago
Not sure about this one. Looks like you're missing the bc
program?
line 239: bc: command not found
ok bc
installed... Now further error
pdf2searchablepdf in.pdf
pdf2searchablepdf ('pdf2searchablepdf') version 0.5.0
Author = Gabriel Staples
See 'pdf2searchablepdf -h' for more info.
Language = eng
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converting input PDF (in.pdf) into a searchable PDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating temporary working directory: "pdf2searchablepdf_temp_20220302-195105.945720973"
Converting input PDF to a bunch of output TIF images inside temporary working directory.
- THIS COULD TAKE A LONG TIME (up to 45 sec or so per page)! Manually watch the temporary
working directory to see the pages created one-by-one to roughly monitor progress.
- NB: each TIF file created is ~25MB, so ensure you have enough disk space for this
operation to complete successfully.
fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE* 0x7756d13018
/data/data/com.termux/files/home/bin/pdf2searchablepdf: line 243: 25589 Aborted pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"
ERROR: 'pdftoppm' failed. ret_code = 134
Removing temporary working directory at "pdf2searchablepdf_temp_20220302-195105.945720973".
Done!
Total script run-time: 1 sec (0.017 min).
real 0m0.371s
user 0m0.165s
sys 0m0.102s
Note to self: I don't know what "Termux" is; it looks like this is it: https://github.com/termux/termux-app
The parts that stick out to me are:
fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE* 0x7cd2d13018
and
pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"
ERROR: 'pdftoppm' failed. ret_code = 134
Can you run this part by itself?:
pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"
# example
mkdir -p temp && pdftoppm -tiff -r 300 "in.pdf" "temp/images_pg
That last cmd will create a temp
dir and place a bunch of images into it with names which begin with images_pg
. I'd like to know if that works for you.
Also, I have no idea what error code 134 is. man pdftoppm
shows only these exit codes:
EXIT CODES
The Xpdf tools use the following exit codes:
0 No error.
1 Error opening a PDF file.
2 Error opening an output file.
3 Error related to PDF permissions.
99 Other error.
Note to self: here is, apparently, the pdftoppm
source code: https://gitlab.freedesktop.org/poppler/poppler/-/blob/master/utils/pdftoppm.cc#L214
The fdsan
errors appear to be related to termux bugs out of my control. See this Google search for "fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE"
A couple links with similar errors:
https://github.com/termux/termux-packages/issues/5980 This one says here:
Fixed in
1.9.4-11
.
https://github.com/termux/termux-packages/issues/6592 This one says here:
Fixed in
27.2-1
.
I don't know how to check this (as I don't know a thing about Termux), but what version of Termux do you have?
You should probably open up a bug report here (I think): https://github.com/termux/termux-packages for the termux "poppler" package which contains pdftoppm
.
This is out of my realm, but the bug doesn't appear to be related directly to my pdf2searchablepdf
program, nor within my control.
As a temporary work-around you could try modifying the pdf2searchablepdf
with some sort of alternative to pdftoppm
which can do the same thing: convert a PDF to a bunch of images. This is all we are trying to do there. You'd have to find something that works with termux (whatever that is :)).
for now ocrmypdf
works great
@defencedog , how can I test termux and get it running myself? This is a linux distro or emulator for a phone, no?
@ElectricRCAircraftGuy u can install on any Android for free https://play.google.com/store/apps/details?id=com.termux&hl=en&gl=US
Its a full fledge Linux within Android, with pip installed i can even run jupyter-notes
Thanks. I just installed Termux.
Runing pkg install bc
now, to install the missing dependency.
Ok, here are my own attempted steps so far, and where I am stuck, myself:
First, install Termux on Android.
Then, inside the Termux app on Android, run:
pkg install bc
pkg install git
pkg install poppler
# note: packages can be removed with `pkg remove pkg_name`
git clone https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF.git
PDF2SearchablePDF/intall.sh
mv ~/bin/pdf2searchablepdf ~/../usr/bin
pdf2searchablepdf -h
# scroll up/down with the up/down arrow keys above your keyboard in Termux
# Press Q to quit.
curl https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF/raw/master/tests/pdfs/test1.pdf > test1.pdf
pdf2searchablepdf test1.pdf # Fails!
Here is my failed output:
$ pdf2searchablepdf test1.pdf
pdf2searchablepdf ('pdf2searchablepdf') version 0.5.0
Author = Gabriel Staples
See 'pdf2searchablepdf -h' for more info.
Language = eng
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converting input PDF (test1.pdf) into a searchable PDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating temporary working directory: "pdf2searchablepdf_temp_20220502-224832.105535441"
Converting input PDF to a bunch of output TIF images inside temporary working directory.
- THIS COULD TAKE A LONG TIME (up to 45 sec or so per page)! Manually watch the temporary
working directory to see the pages created one-by-one to roughly monitor progress.
- NB: each TIF file created is ~25MB, so ensure you have enough disk space for this
operation to complete successfully.
CANNOT LINK EXECUTABLE "pdftoppm": cannot locate symbol "__emutls_get_address" referenced by "/data/data/com.termux/files/usr/lib/libpoppler.so"...
ERROR: 'pdftoppm' failed. ret_code = 1
Removing temporary working directory at "pdf2searchablepdf_temp_20220502-224832.105535441".
Done!
Total script run-time: 0 sec (0.000 min).
real 0m0.071s
user 0m0.043s
sys 0m0.051s
pdftoppm
, which is part of the poppler
package, is broken. Even pdftoppm -h
fails!:
$ pdftoppm -h
CANNOT LINK EXECUTABLE "pdftoppm": cannot locate symbol "__emutls_get_address" referenced by "/data/data/com.termux/files/usr/lib/libpoppler.so"...
(notes to self)
New attempt:
Termux is deprecated from the Google Play store. Do NOT install it from the Google Play store!
See the Termux installation instructions here: https://github.com/termux/termux-app#Installation, and a note directly from the maintainer here: https://github.com/termux/termux-packages/issues/10470#issuecomment-1115767330.
Rather, install the open-source Android app store called F-Droid, here: https://f-droid.org/, and then search for and install Termux from within the F-Droid app (link here: https://f-droid.org/en/packages/com.termux/).
The \~100 MB download of Termux in the F-Droid store is suuuuuper slow. Expect to wait 15\~20 minutes. Go fry some eggs and bacon while you wait, or some Impossible burgers and Beyond Meat burgers (my preference), and eat some Stryve biltong (\~$29 for 1 10 oz pack from Amazon, or \~$11 from your local Walmart), and drink a relaxing cup of coffee-free Inka fake coffee from Poland (good stuff!) or Postum (this stuff is awesome!), and enjoy the wait. If making home-made Ramen, don't forget to add some MSG to reduce the sodium while giving it some amazing savory umami flavor!
To be continued...
@ElectricRCAircraftGuy I have installed it via F-Droid
@defencedog , that kind of information is important to share with me up front next time please, especially when you're asking for my help to solve the problem. I spent a bunch of time figuring that out that I could have put towards solving the problem instead. I need to be able to duplicate what you have done so I can see the problem and error you are seeing so I can have a chance at fixing it.
Think of posting an issue here as being similar to posting a question on Stack Overflow: I need to have a fully-reprodicible set of instructions I can follow to get your same error.
That's my mistake too. I should have started out with my very first response to this issue as:
I don't know what Termux is. Please provide me with an exact set of detailed instructions I can follow from scratch to get Termux up and running on my own system so I can see the exact same error you are seeing.
That's where I'm at now. I spent 2 hrs last night setting up Termux for my first time only to get stuck because I followed the link to the Google Play store that you sent instead of to the F-Droid download location which you didn't send.
I'm still trying to get to where I see the exact same error you see.
I have already
poppler
& in my bin dirpdftoppm
existsI have installed program successfully but this error is at Runtime
I have a working
ocrmypdf
which also depends on tesseract https://www.reddit.com/r/termux/comments/t3cvgz/help_install_ocrmypdf_through_pip/hysq36s/?context=3