Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.56k stars 109 forks source link

ug+ : many warning messages to stderr for problems with PDFs #377

Closed ashmanskas closed 4 months ago

ashmanskas commented 4 months ago

I see hundreds of warning messages to stderr when searching a large directory that includes many PDFs. Suppressing duplicates, the messages are copied below.

Neither -s nor --no-messages suppresses these messages. One way I can easily suppress them (in bash) is with

alias ug+='ug+ 2>/dev/null'

but is there some way to get ugrep itself to suppress them, analogous to the existing --no-messages option?

ugrep is a great utility, by the way -- many thanks for your work on it!

===

Internal Error: xref num 580 not found but needed, try to reconstruct<0a> Syntax Error (571978): No font in show Syntax Error: Expected the default config, but wasn't able to find it, or it isn't a Dictionary Syntax Error: Illegal file spec Syntax Error: No font in show Syntax Error: Unknown font tag 'Helvetica' Syntax Error: XObject 'GTHR02P' is unknown Syntax Warning (1039693): Badly formatted number Syntax Warning: Illegal annotation destination Syntax Warning: Invalid Font Weight Syntax Warning: Invalid least number of objects reading page offset hints table Syntax Warning: Mismatch between font type and embedded font file Unknown input format rtf

genivia-inc commented 4 months ago

These messages are generated by the pdftotext tool that is executed as a filter with --filter=pdf:pdftotext. It has a -q switch to turn error messages off, but that is not done with ug+ -q because ug+ is a script.

This should work for you by using ug instead of ug+:

$ ug --filter='pdf:pdftotext -q % -' PATTERN [FILES...]

Perhaps add -tpdf to only search PDF files.

Or send the pdftotext error messages to dev/null on all *nix systems with 2>/dev/null on the command line:

$ ug+ PATTERN [FILES...] 2>/dev/null

Error messages can be useful when PDFs have issues, but we could suppress them perhaps with -q in the ug+ script itself. A more advanced way of suppressing is to fork pdftotext as usual but then send stderr to /dev/null when ugrep option -q or -s is specified. Don't know what's best or what people want.

genivia-inc commented 4 months ago

Adding this to the filter() method at ugrep.cpp:3795 suppresses error messages sent by the filter to stderr when ugrep option -q or -s is used:

            // -q or -q: suppress error messages sent to stderr by the filter command
            if (flag_quiet || flag_no_messages)
            {
              int dev_null = open("/dev/null", O_WRONLY);
              if (dev_null >= 0)
              {
                dup2(dev_null, STDERR_FILENO);
                close(dev_null);
              }
            }

That needs to be added right after this part:

            // dup the writing end of the pipe to stdout
            dup2(fd[1], STDOUT_FILENO);
            close(fd[1]);
genivia-inc commented 4 months ago

Fixed in v5.1.2