ElectricRCAircraftGuy / PDF2SearchablePDF

`pdf2searchablepdf input.pdf` = voila! "input_searchable.pdf" is created & now has searchable text!
MIT License
126 stars 14 forks source link

Android Termux ARM64 Runtime Error #22

Open defencedog opened 2 years ago

defencedog commented 2 years ago

I have already poppler & in my bin dir pdftoppm exists

I have installed program successfully but this error is at Runtime

~/download $ pdf2searchablepdf in.pdf
pdf2searchablepdf ('pdf2searchablepdf') version 0.5.0
Author = Gabriel Staples
See 'pdf2searchablepdf -h' for more info.

Language = eng
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converting input PDF (in.pdf) into a searchable PDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating temporary working directory: "pdf2searchablepdf_temp_20220302-022520.445752464"
Converting input PDF to a bunch of output TIF images inside temporary working directory.
- THIS COULD TAKE A LONG TIME (up to 45 sec or so per page)! Manually watch the temporary
  working directory to see the pages created one-by-one to roughly monitor progress.
- NB: each TIF file created is ~25MB, so ensure you have enough disk space for this
  operation to complete successfully.
fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE* 0x7cd2d13018
/data/data/com.termux/files/home/bin/pdf2searchablepdf: line 243: 20786 Aborted                 pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"
ERROR: 'pdftoppm' failed. ret_code = 134
Removing temporary working directory at "pdf2searchablepdf_temp_20220302-022520.445752464".
Done!
/data/data/com.termux/files/home/bin/pdf2searchablepdf: line 239: bc: command not found

Total script run-time: 0 sec (0.000 min).

real    0m0.542s
user    0m0.255s
sys     0m0.074s

I have a working ocrmypdf which also depends on tesseract https://www.reddit.com/r/termux/comments/t3cvgz/help_install_ocrmypdf_through_pip/hysq36s/?context=3

ElectricRCAircraftGuy commented 2 years ago

Not sure about this one. Looks like you're missing the bc program?

line 239: bc: command not found
defencedog commented 2 years ago

ok bc installed... Now further error

pdf2searchablepdf in.pdf
pdf2searchablepdf ('pdf2searchablepdf') version 0.5.0
Author = Gabriel Staples
See 'pdf2searchablepdf -h' for more info.

Language = eng
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converting input PDF (in.pdf) into a searchable PDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating temporary working directory: "pdf2searchablepdf_temp_20220302-195105.945720973"
Converting input PDF to a bunch of output TIF images inside temporary working directory.
- THIS COULD TAKE A LONG TIME (up to 45 sec or so per page)! Manually watch the temporary
  working directory to see the pages created one-by-one to roughly monitor progress.
- NB: each TIF file created is ~25MB, so ensure you have enough disk space for this
  operation to complete successfully.
fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE* 0x7756d13018
/data/data/com.termux/files/home/bin/pdf2searchablepdf: line 243: 25589 Aborted                 pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"
ERROR: 'pdftoppm' failed. ret_code = 134
Removing temporary working directory at "pdf2searchablepdf_temp_20220302-195105.945720973".
Done!

Total script run-time: 1 sec (0.017 min).

real    0m0.371s
user    0m0.165s
sys     0m0.102s
ElectricRCAircraftGuy commented 2 years ago

Note to self: I don't know what "Termux" is; it looks like this is it: https://github.com/termux/termux-app

ElectricRCAircraftGuy commented 2 years ago

The parts that stick out to me are:

fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE* 0x7cd2d13018

and

pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"
ERROR: 'pdftoppm' failed. ret_code = 134

Can you run this part by itself?:

pdftoppm $user_password -tiff -r 300 "$pdf_in" "$temp_dir/pg"

# example
mkdir -p temp && pdftoppm -tiff -r 300 "in.pdf" "temp/images_pg

That last cmd will create a temp dir and place a bunch of images into it with names which begin with images_pg. I'd like to know if that works for you.

Also, I have no idea what error code 134 is. man pdftoppm shows only these exit codes:

EXIT CODES
       The Xpdf tools use the following exit codes:

       0      No error.

       1      Error opening a PDF file.

       2      Error opening an output file.

       3      Error related to PDF permissions.

       99     Other error.

Note to self: here is, apparently, the pdftoppm source code: https://gitlab.freedesktop.org/poppler/poppler/-/blob/master/utils/pdftoppm.cc#L214

ElectricRCAircraftGuy commented 2 years ago

The fdsan errors appear to be related to termux bugs out of my control. See this Google search for "fdsan: attempted to close file descriptor 4, expected to be unowned, actually owned by FILE"

A couple links with similar errors:

  1. https://github.com/termux/termux-packages/issues/5980 This one says here:

    Fixed in 1.9.4-11.

  2. https://github.com/termux/termux-packages/issues/6592 This one says here:

    Fixed in 27.2-1.

I don't know how to check this (as I don't know a thing about Termux), but what version of Termux do you have?

ElectricRCAircraftGuy commented 2 years ago

You should probably open up a bug report here (I think): https://github.com/termux/termux-packages for the termux "poppler" package which contains pdftoppm.

This is out of my realm, but the bug doesn't appear to be related directly to my pdf2searchablepdf program, nor within my control.

As a temporary work-around you could try modifying the pdf2searchablepdf with some sort of alternative to pdftoppm which can do the same thing: convert a PDF to a bunch of images. This is all we are trying to do there. You'd have to find something that works with termux (whatever that is :)).

defencedog commented 2 years ago

for now ocrmypdf works great

https://www.reddit.com/r/termux/comments/t3cvgz/comment/hysxlec/?utm_source=share&utm_medium=web2x&context=3

ElectricRCAircraftGuy commented 2 years ago

@defencedog , how can I test termux and get it running myself? This is a linux distro or emulator for a phone, no?

defencedog commented 2 years ago

@ElectricRCAircraftGuy u can install on any Android for free https://play.google.com/store/apps/details?id=com.termux&hl=en&gl=US

Its a full fledge Linux within Android, with pip installed i can even run jupyter-notes

ElectricRCAircraftGuy commented 2 years ago

Thanks. I just installed Termux.

Runing pkg install bc now, to install the missing dependency.

ElectricRCAircraftGuy commented 2 years ago

Ok, here are my own attempted steps so far, and where I am stuck, myself:

First, install Termux on Android.

Then, inside the Termux app on Android, run:

pkg install bc
pkg install git 
pkg install poppler

# note: packages can be removed with `pkg remove pkg_name`

git clone https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF.git
PDF2SearchablePDF/intall.sh
mv ~/bin/pdf2searchablepdf ~/../usr/bin
pdf2searchablepdf -h
# scroll up/down with the up/down arrow keys above your keyboard in Termux
# Press Q to quit.

curl https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF/raw/master/tests/pdfs/test1.pdf > test1.pdf

pdf2searchablepdf test1.pdf   # Fails!

Here is my failed output:

$ pdf2searchablepdf test1.pdf
pdf2searchablepdf ('pdf2searchablepdf') version 0.5.0
Author = Gabriel Staples
See 'pdf2searchablepdf -h' for more info.

Language = eng
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converting input PDF (test1.pdf) into a searchable PDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating temporary working directory: "pdf2searchablepdf_temp_20220502-224832.105535441"
Converting input PDF to a bunch of output TIF images inside temporary working directory.
- THIS COULD TAKE A LONG TIME (up to 45 sec or so per page)! Manually watch the temporary
  working directory to see the pages created one-by-one to roughly monitor progress.
- NB: each TIF file created is ~25MB, so ensure you have enough disk space for this
  operation to complete successfully.
CANNOT LINK EXECUTABLE "pdftoppm": cannot locate symbol "__emutls_get_address" referenced by "/data/data/com.termux/files/usr/lib/libpoppler.so"...
ERROR: 'pdftoppm' failed. ret_code = 1
Removing temporary working directory at "pdf2searchablepdf_temp_20220502-224832.105535441".
Done!

Total script run-time: 0 sec (0.000 min).

real    0m0.071s
user    0m0.043s
sys     0m0.051s

pdftoppm, which is part of the poppler package, is broken. Even pdftoppm -h fails!:

$ pdftoppm -h
CANNOT LINK EXECUTABLE "pdftoppm": cannot locate symbol "__emutls_get_address" referenced by "/data/data/com.termux/files/usr/lib/libpoppler.so"...
ElectricRCAircraftGuy commented 2 years ago

(notes to self)

New attempt:

Termux is deprecated from the Google Play store. Do NOT install it from the Google Play store!

See the Termux installation instructions here: https://github.com/termux/termux-app#Installation, and a note directly from the maintainer here: https://github.com/termux/termux-packages/issues/10470#issuecomment-1115767330.

Rather, install the open-source Android app store called F-Droid, here: https://f-droid.org/, and then search for and install Termux from within the F-Droid app (link here: https://f-droid.org/en/packages/com.termux/).

The \~100 MB download of Termux in the F-Droid store is suuuuuper slow. Expect to wait 15\~20 minutes. Go fry some eggs and bacon while you wait, or some Impossible burgers and Beyond Meat burgers (my preference), and eat some Stryve biltong (\~$29 for 1 10 oz pack from Amazon, or \~$11 from your local Walmart), and drink a relaxing cup of coffee-free Inka fake coffee from Poland (good stuff!) or Postum (this stuff is awesome!), and enjoy the wait. If making home-made Ramen, don't forget to add some MSG to reduce the sodium while giving it some amazing savory umami flavor!

To be continued...

defencedog commented 2 years ago

@ElectricRCAircraftGuy I have installed it via F-Droid

ElectricRCAircraftGuy commented 2 years ago

@defencedog , that kind of information is important to share with me up front next time please, especially when you're asking for my help to solve the problem. I spent a bunch of time figuring that out that I could have put towards solving the problem instead. I need to be able to duplicate what you have done so I can see the problem and error you are seeing so I can have a chance at fixing it.

Think of posting an issue here as being similar to posting a question on Stack Overflow: I need to have a fully-reprodicible set of instructions I can follow to get your same error.

ElectricRCAircraftGuy commented 2 years ago

That's my mistake too. I should have started out with my very first response to this issue as:

I don't know what Termux is. Please provide me with an exact set of detailed instructions I can follow from scratch to get Termux up and running on my own system so I can see the exact same error you are seeing.

That's where I'm at now. I spent 2 hrs last night setting up Termux for my first time only to get stuck because I followed the link to the Google Play store that you sent instead of to the F-Droid download location which you didn't send.

I'm still trying to get to where I see the exact same error you see.