bellecp / fast-p

MIT License
343 stars 26 forks source link

~/.cache/pdftotext/*: No such file or directory #2

Closed cristian-frincu closed 6 years ago

cristian-frincu commented 6 years ago

For some reason the text is not being generated. I installed all the dependencies you mentioned. Amy idea how it might be fixed?

qoqosz commented 6 years ago

Possibly my error reported here could be a duplicate of this one.

bellecp commented 6 years ago

@cristian-frincu , are you also using MacOS?

Does $ touch ~/.cache/pdftotext/temporary fix the issue?

cristian-frincu commented 6 years ago

I am using MacOS 10.13.6 I played around with it a bit more and I believe the issue is that pdftotext is not installed when installing texlive. So I am not quite sure how to install pdftotext.

i tried brew cask install pdftotext, but there are some errors and it can't be installed.

bellecp commented 6 years ago

I could reproduce the issue when the directory ~/.cache/pdftotext/ is empty, although the script behaves normally after printing the ~/.cache/pdftotext/*: No such file or directory error. Simply doing $ touch ~/.cache/pdftotext/temporary removes the error. Can you confirm?

cristian-frincu commented 6 years ago

Yes, the error went away, but when I run 'p', no files show up (where am I supposed to store my pdf's?) and when I close p, I see: 'xargs: unterminated quote'

bellecp commented 6 years ago

The script starts with ag -U -g ".pdf$" which lists all pdfs found in the current directory (and its subdirectories).

cristian-frincu commented 6 years ago

Ok, I wasnt aware I need to navigate to the folder. It seems the issue is still largely with pdftotext. When I run within the folder, it does perform some indexing, and creates files in ~/.cache/pdftotext, but they are all empty.

Are you using this script on OSX or linux? How did you get pdftotext installed?

bellecp commented 6 years ago

The script has been tested on ubuntu 17.10 and 18.04.

Can you try installing pdftotext via homebrew with the poppler formula? And then try pdftotext -f 1 -l 2 some_of_your_pdf.pdf.

What is the output of the commands

cristian-frincu commented 6 years ago

So the pdftotext install worked now and transformed to text file.

bash --version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin17) Copyright (C) 2007 Free Software Foundation, Inc.

awk --version awk version 20070501

grep --version grep (BSD grep) 2.5.1-FreeBSD

ag --version ag version 2.1.0

Features: +jit +lzma +zlib

xxh64sum --version xxh64sum 0.6.5 (64-bits little endian), by Yann Collet

pdftotext -v pdftotext version 0.67.0 Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC

xargs does not indicate version

When I open p, it shows two lines saying 'grep: empty (sub)expression'

bellecp commented 6 years ago

Can you install gnu-grep instead via homebrew and try again?

cristian-frincu commented 6 years ago

I installed the gnu grep:

grep --version grep (GNU grep) 3.1 Packaged by Homebrew Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS.

Now there are no errors, but nothing happens when run the program. It just shows the number of pdf's in the folder, but nothing happens if I start typing.

bellecp commented 6 years ago

Start afresh: rm ~/.cache/pdftotext/* && touch ~/.cache/pdftotext/NOOP. Run the command pagain. How many files are in ~/.cache/pdftotext/ and what is their content?

cristian-frincu commented 6 years ago

There are many files, including the NOOP one, they have a {hash}+{file name}.pdf and are all empty

bellecp commented 6 years ago

Can you confirm that this is the same problem as in https://github.com/bellecp/fast-p/issues/3 ?

bellecp commented 6 years ago

The correct behavior of the filenames that contain the cache are:

.cache/pdftotext/fb1640a5a92ad0bd .cache/pdftotext/fba8164792621b9a .cache/pdftotext/fd8493dd0a53a30a .cache/pdftotext/fffdcfcb4f2dd810 .cache/pdftotext/NOOP

kxk commented 6 years ago

Hey, so I followed the instructions too:

  1. Installed gnu-grep
  2. rm ~/.cache/pdftotext/* && touch ~/.cache/pdftotext/NOOP

Now when I run it, it gives me a xargs: unterminated quote note on the terminal and only shows 1/1 without any name / title or the rest of the pdfs.

bellecp commented 6 years ago

@kxk : what are the filenames in ~/.cache/pdftotext/ and what do these files contain?

kxk commented 6 years ago

Right now, the only thing I have is

~/Downloads$ ls ~/.cache/pdftotext/
NOOP

But I tried the command pdftotext -f 1 -l 2 123.pdf and that produced a 123.txt with the correct info.

bellecp commented 6 years ago

Can you try again with the GNU xargs from http://brewformulas.org/Findutils ?

bellecp commented 6 years ago

Also, what do head /tmp/fewijbbioasBBBB and head /tmp/fewijbbioasAAAA output?

kxk commented 6 years ago

I installed http://brewformulas.org/Findutils, what do you want me to do with it? The two commands you wrote return nothing.

kxk commented 6 years ago

I'd love to get this working on a Mac. Please let me know what else I can test to help debug this. Thanks a lot, this seems tremendously useful!

bellecp commented 6 years ago

What's the output of gawk --version, awk --version and the top line of man awk?

kxk commented 6 years ago

Hey, thanks for keeping up with me. So,

~$ gawk --version bash: gawk: command not found

~$ awk --version awk version 20070501

and

Not sure if you mean literally the first line, but it has:

AWK(1) AWK(1)

Thanks!

bellecp commented 6 years ago

That may be the culprit. Please install gawk (GNU awk) with homebrew. Make sure gawk --version is found. Then checkout the branch gawk from the repository, do rm .cache/pdftotext/* and try the command p again.

kxk commented 6 years ago

Did all the above.

When I run it in a folder with pdfs, it seems like it counts them right (in fact 1 higher than the actual number of files in there), displays in terminal that there are 18/18 files there, but it doesn't show title, or any preview. And if I type anything even the word "the" it says 0/18 files include it. Does that help?

bellecp commented 6 years ago

What's the output of pdftotext -f 1 -l 2 some_of_your_pdf.pdf -? If that looks OK (content of first two pages of your pdf file), try brew install bash and make sure that bash --version outputs version at least v4. Make sure that you are in the homebrew bash and not the OSX bash by typing bash.

bellecp commented 6 years ago

@kxk does https://github.com/bellecp/fast-p/issues/3#issuecomment-408237740 work fine?

kxk commented 6 years ago

It looks OK, correct display of the pdf and my bash -- version is GNU bash, version 4.4.23(1)-release (x86_64-apple-darwin17.5.0).

kxk commented 6 years ago

@kxk does #3 (comment) work fine?

Oh, that script works. Though it has about 10 copies of each pdf in the list.

bellecp commented 6 years ago

For speed and interoperability, I am consdering using a go binary instead of the bash scripts that generate many errors on OSX. @kxk @cristian-frincu , could you let me know if https://github.com/bellecp/fast-p/blob/master/README-OSX.md works?

bellecp commented 6 years ago

According to #4 the go binary version works. If it does not for you, feel free to open a new issue.