bmc / paragrep

Paragraph grep utility
http://software.clapper.org/paragrep/
Other
14 stars 0 forks source link

User guide #3

Open saravananpsg opened 5 years ago

saravananpsg commented 5 years ago

Is it possible to share a few examples for parsing the paragraphs using paragrep utility?

bmc commented 5 years ago

I should update the docs to do that.

Here's one simple example. Suppose you have a text file with the following:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Love is merely a madness, and, I tell you, deserves as well a dark house and a
whip as madmen do; and the reason why they are not so punished and cured is,
that the lunacy is so ordinary that the whippers are in love too. Yet I profess
curing it by counsel. (from Shakespeare, "As You Like It")

The mercy that was quick in us but late, 
By your own counsel is suppress'd and kill'd:
You must not dare, for shame, to talk of mercy;
For your own reasons turn into your bosoms,
As dogs upon their masters, worrying you.
See you, my princes, and my noble peers, 
These English monsters!
(from Shakespeare, "Henry V")

And the Lord spake, saying, "First shalt thou take out the Holy Pin. Then shalt
thou count to three, no more, no less. Three shall be the number thou shalt
count, and the number of the counting shall be three. Four shalt thou not
count, neither count thou two, excepting that thou then proceed to three. Five
is right out! Once the number three, being the third number, be reached, then
lobbest thou thy Holy Hand Grenade of Antioch towards thy foe, who, being
naughty in my sight, shall snuff it. (from "Monty Python and the Holy Grail")

Let's assume this is stored in /tmp/foo.txt.

Note that this document has four paragraphs, delimited by empty lines. What if you want to print each paragraph that contains the string "dog"? Use this:

paragrep dog /tmp/foo.txt

That command will print the entire second-to-last paragraph and only that paragraph. (Compare that to grep, which will print just the line containing the string "dogs".

What about this?

paragrep -i exc /tmp/foo.txt

That will print the entire first paragraph (because it contains the Latin word "Excepteur" and the entire last paragraph (because it contains the word "excepting"). The -i says to do case-blind comparison.

Here's a final example, using a regular expression:

paragrep 'mo[nr]' /tmp/foo.txt

This will print the third paragraph (the quote from "Henry V"), because mo[nr] will match the word "monsters". It will also print the fourth paragraph, because mo[nr] matches "more".

I originally wrote this tool, back in the 1980s (in C, at the time), because I had the need to find entire quotes in a large file full of quotes, where each quote occupied multiple lines. Over the years, I've rewritten the tool, first in Perl, then in Python. I use it now to search Markdown documents, as well as a "fortune" file full of quotes. That latter file has paragraphs delimited by "%", not blank lines, but paragrap has an option that allows me to specify a paragraph delimiter other than empty line.

Does that help?

saravananpsg commented 5 years ago

Thank you Brian. It is really helpful. This is super cool. I will save this utility for my future reference. I am glad to reach out to you for an utility that was written in 80s.

I was just looking for an utility that can able to grep the paragraphs, list and headings.