Closed cristian-frincu closed 6 years ago
@cristian-frincu , are you also using MacOS?
Does $ touch ~/.cache/pdftotext/temporary
fix the issue?
I am using MacOS 10.13.6 I played around with it a bit more and I believe the issue is that pdftotext is not installed when installing texlive. So I am not quite sure how to install pdftotext.
i tried brew cask install pdftotext
, but there are some errors and it can't be installed.
I could reproduce the issue when the directory ~/.cache/pdftotext/ is empty, although the script behaves normally after printing the ~/.cache/pdftotext/*: No such file or directory
error. Simply doing $ touch ~/.cache/pdftotext/temporary
removes the error. Can you confirm?
Yes, the error went away, but when I run 'p', no files show up (where am I supposed to store my pdf's?) and when I close p, I see: 'xargs: unterminated quote'
The script starts with ag -U -g ".pdf$"
which lists all pdfs found in the current directory (and its subdirectories).
Ok, I wasnt aware I need to navigate to the folder. It seems the issue is still largely with pdftotext. When I run within the folder, it does perform some indexing, and creates files in ~/.cache/pdftotext, but they are all empty.
Are you using this script on OSX or linux? How did you get pdftotext installed?
The script has been tested on ubuntu 17.10 and 18.04.
Can you try installing pdftotext via homebrew with the poppler formula? And then try
pdftotext -f 1 -l 2 some_of_your_pdf.pdf
.
What is the output of the commands
bash --version
awk --version
grep --version
ag --version
xargs --version
xxh64sum --version
pdftotext -v
So the pdftotext install worked now and transformed to text file.
bash --version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin17) Copyright (C) 2007 Free Software Foundation, Inc.
awk --version awk version 20070501
grep --version grep (BSD grep) 2.5.1-FreeBSD
ag --version ag version 2.1.0
Features: +jit +lzma +zlib
xxh64sum --version xxh64sum 0.6.5 (64-bits little endian), by Yann Collet
pdftotext -v pdftotext version 0.67.0 Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC
xargs does not indicate version
When I open p, it shows two lines saying 'grep: empty (sub)expression'
Can you install gnu-grep instead via homebrew and try again?
I installed the gnu grep:
grep --version grep (GNU grep) 3.1 Packaged by Homebrew Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS.
Now there are no errors, but nothing happens when run the program. It just shows the number of pdf's in the folder, but nothing happens if I start typing.
Start afresh: rm ~/.cache/pdftotext/* && touch ~/.cache/pdftotext/NOOP
. Run the command p
again. How many files are in ~/.cache/pdftotext/ and what is their content?
There are many files, including the NOOP one, they have a {hash}+{file name}.pdf and are all empty
Can you confirm that this is the same problem as in https://github.com/bellecp/fast-p/issues/3 ?
The correct behavior of the filenames that contain the cache are:
.cache/pdftotext/fb1640a5a92ad0bd .cache/pdftotext/fba8164792621b9a .cache/pdftotext/fd8493dd0a53a30a .cache/pdftotext/fffdcfcb4f2dd810 .cache/pdftotext/NOOP
Hey, so I followed the instructions too:
gnu-grep
rm ~/.cache/pdftotext/* && touch ~/.cache/pdftotext/NOOP
Now when I run it, it gives me a xargs: unterminated quote
note on the terminal and only shows 1/1
without any name / title or the rest of the pdfs.
@kxk : what are the filenames in ~/.cache/pdftotext/ and what do these files contain?
Right now, the only thing I have is
~/Downloads$ ls ~/.cache/pdftotext/
NOOP
But I tried the command pdftotext -f 1 -l 2 123.pdf
and that produced a 123.txt
with the correct info.
Can you try again with the GNU xargs from http://brewformulas.org/Findutils ?
Also, what do head /tmp/fewijbbioasBBBB
and head /tmp/fewijbbioasAAAA
output?
I installed http://brewformulas.org/Findutils, what do you want me to do with it? The two commands you wrote return nothing.
I'd love to get this working on a Mac. Please let me know what else I can test to help debug this. Thanks a lot, this seems tremendously useful!
What's the output of gawk --version
, awk --version
and the top line of man awk
?
Hey, thanks for keeping up with me. So,
~$ gawk --version bash: gawk: command not found
~$ awk --version awk version 20070501
and
Not sure if you mean literally the first line, but it has:
AWK(1) AWK(1)
Thanks!
That may be the culprit. Please install gawk (GNU awk) with homebrew. Make sure gawk --version
is found. Then checkout the branch gawk
from the repository, do rm .cache/pdftotext/*
and try the command p
again.
Did all the above.
When I run it in a folder with pdfs, it seems like it counts them right (in fact 1 higher than the actual number of files in there), displays in terminal that there are 18/18 files there, but it doesn't show title, or any preview. And if I type anything even the word "the" it says 0/18 files include it. Does that help?
What's the output of pdftotext -f 1 -l 2 some_of_your_pdf.pdf -
? If that looks OK (content of first two pages of your pdf file), try brew install bash
and make sure that bash --version
outputs version at least v4. Make sure that you are in the homebrew bash and not the OSX bash by typing bash
.
@kxk does https://github.com/bellecp/fast-p/issues/3#issuecomment-408237740 work fine?
It looks OK, correct display of the pdf and my bash -- version
is GNU bash, version 4.4.23(1)-release (x86_64-apple-darwin17.5.0)
.
@kxk does #3 (comment) work fine?
Oh, that script works. Though it has about 10 copies of each pdf in the list.
For speed and interoperability, I am consdering using a go binary instead of the bash scripts that generate many errors on OSX. @kxk @cristian-frincu , could you let me know if https://github.com/bellecp/fast-p/blob/master/README-OSX.md works?
According to #4 the go binary version works. If it does not for you, feel free to open a new issue.
For some reason the text is not being generated. I installed all the dependencies you mentioned. Amy idea how it might be fixed?