Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

Spurious uninitialized variable warning #2089

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#3378 (status was 'resolved')

Searchable as RT3378$

p5pRT commented 24 years ago

From david@coppit.org

Created by dwc3q@altair.cs.virginia.edu

This is a bug report for perl from dwc3q@​altair.cs.virginia.edu\, generated with the help of perlbug 1.28 running under perl v5.6.0.

-----------------------------------------------------------------

When I run my grepmail script\, I get an uninitialized variable warning\, even though every variable I can think of is initialized. Here's the relevant code​:

# Stop on warning. local $SIG{__WARN__} = sub { print $_[0];exit };

  print "a\n";   if ($header =~ /^(Subject​:.*)$/im) # This line causes the warning.   {   print "b\n";   dprint " $1";   }

[$header *IS* defined.]

Unfortunately\, I can't seem to trigger the bug with a simpler version of my script\, or with a simpler version of the input file. It seems that if I try to remove a line of input\, the warning mysteriously disappears. The above code corresponds to line 902 of the code\, which is attached with the input file.

You can reproduce the spurious warning by running​:   perl grepmail -D dummypattern input

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.6.0: Configured by dwc3q at Tue Mar 28 09:40:34 EST 2000. Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration: Platform: osname=solaris, osvers=2.6, archname=sun4-solaris uname='sunos altair.cs.virginia.edu 5.6 generic_105181-05 sun4u sparc sunw,ultra-1 ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef Compiler: cc='cc', optimize='-O', gccversion= cppflags='-I/usr/local/include' ccflags ='-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' stdchar='unsigned char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, usemymalloc=y, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib -L/opt/gnu/lib ' libpth=/usr/local/lib /opt/gnu/lib /lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldb -ldl -lm -lc -lcrypt -lsec libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-KPIC', lddlflags='-G -L/usr/local/lib -L/opt/gnu/lib' Locally applied patches: @INC for perl v5.6.0: /users/dwc3q/perl/lib/site_perl/5.005/sun4-solaris /users/dwc3q/perl/lib/site_perl/5.005 /af4/dwc3q/perl5.6/lib/5.6.0/sun4-solaris /af4/dwc3q/perl5.6/lib/5.6.0 /af4/dwc3q/perl5.6/lib/site_perl/5.6.0/sun4-solaris /af4/dwc3q/perl5.6/lib/site_perl/5.6.0 /af4/dwc3q/perl5.6/lib/site_perl . Environment for perl v5.6.0: HOME=/af4/dwc3q LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH=/home/dwc3q/lib:/usr/lib:/usr/openwin/lib:/usr/dt/lib:/X11.6/lib:/X11.5/lib:/usr/cs/lib:/uva/lib:/gnu/lib:/usr/cs/contrib/lib LOGDIR (unset) PATH=/users/dwc3q/bin:/users/dwc3q/perl/bin:/usr/cs/contrib/bin:/contrib/bin:/usr/dt/bin:/usr/openwin/bin:/X11.6/bin:/X11.5/bin:/usr/cs/bin:/opt/SUNWspro/bin:/usr/ccs/bin:/usr/cs/teTeX/bin:/uva/bin:/gnu/bin:/uva/mh/bin:/quadralay/bin:/usr/ucb:/usr/bin:/usr/sbin:/sbin:. PERL5LIB=/users/dwc3q/perl/lib/site_perl/5.005 PERL_BADLANG (unset) SHELL=/usr/cs/bin/bash _________________________________________________________________________ David Coppit - Graduate Student david@coppit.org The University of Virginia http://coppit.org/ "Yes," said Piglet, "Rabbit has Brain." There was a long silence. "I suppose," said Pooh, "that that's why he never understands anything." ```
p5pRT commented 24 years ago

From david@coppit.org

#!/usr/bin/perl -w

# grepmail

$VERSION = '4.41';

# Grepmail searches a normal\, gzip'd\, tzip'd\, or bzip2'd mailbox for a given # regular expression and returns those emails that match the query. It also # supports piped compressed or ascii input\, and searches constrained by date # and size.

# Visit the grepmail project homepage at http​://grepmail.sourceforge.net/ # There you can join the announcements mailing list to be notified of updates\, # grab the development environment via CVS\, participate in chats and mailing # lists\, report bugs\, submit patches\, etc.

# Do a pod2text on this file to get full documentation\, or pod2man to get # man pages.

# Written by David Coppit (david@​coppit.org\, http​://coppit.org/) with lots of # debugging and patching by others -- see the CHANGES file for a complete # list.

# This code is distributed under the GNU General Public License (GPL). See # http​://www.opensource.org/gpl-license.html and http​://www.opensource.org/.

# Notes​: # It turns out that -h\, -b\, -d\, -s and -v have some nasty feature interaction. # Here's a table of how matching should occur for each combination of flags​: # # % B\, H\, S\, D\,!V # Match if body\, header\, size\, and date match # %!B\, H\, S\, D\,!V # Match if header\, size\, and date match -- don't care about body # % B\,!H\, S\, D\,!V # Match if body\, size\, and date match -- don't care about header # % B\, H\,!S\, D\,!V # Match if body\, header\, and date match -- don't care about size # % B\, H\, S\,!D\,!V # Match if body\, header\, and size match -- don't care about date # %!B\,!H\, S\, D\,!V # Match if size and date and (body or header) matches #*%!B\, H\,!S\, D\,!V # Match if header and date matches -- don't care about body or size # %!B\, H\, S\,!D\,!V # Match if header and size matches -- don't care about body or date # % B\,!H\,!S\, D\,!V # Match if body and date matches -- don't care about header or size # % B\,!H\, S\,!D\,!V # Match if body and size matches -- don't care about header or date # % B\, H\,!S\,!D\,!V # Match if body and header matches -- don't care about size or date #*%!B\,!H\,!S\, D\,!V # Match if date and (body or header) matches -- don't care about size # %!B\,!H\, S\,!D\,!V # Match if size and (body or header) matches -- don't care about date #*%!B\, H\,!S\,!D\,!V # Match if header matches -- don't care about body\, size\, or date # B\,!H\,!S\,!D\,!V # Match if body matches -- don't care about header\, size\, or date #* B\, H\, S\, D\, V # Match if body\, header\, size\, or date doesn't match #* !B\, H\, S\, D\, V # Match if header\, size\, or date doesn't match -- don't care about body #* B\,!H\, S\, D\, V # Match if body\, size\, or date doesn't match -- don't care about header #* B\, H\,!S\, D\, V # Match if body\, header\, or date doesn't match -- don't care about size #* B\, H\, S\,!D\, V # Match if body\, header\, or size doesn't match -- don't care about date #* !B\,!H\, S\, D\, V # Match if size or date doesn't match -- don't care about body or header #*%!B\, H\,!S\, D\, V # Match if header or date doesn't match -- don't care about body or size #* !B\, H\, S\,!D\, V # Match if header or size doesn't match -- don't care about body or date #* B\,!H\,!S\, D\, V # Match if body or date doesn't match -- don't care about header or size #* B\,!H\, S\,!D\, V # Match if body or size doesn't match -- don't care about header or date #* B\, H\,!S\,!D\, V # Match if body or header doesn't match -- don't care about size or date #*%!B\,!H\,!S\, D\, V # Match if date or (body and header) don't match -- don't care about size #* !B\,!H\, S\,!D\, V # Match if size or (body and header) don't match -- don't care about date #*%!B\, H\,!S\,!D\, V # Match if header doesn't match -- don't care about body\, size\, or date # B\,!H\,!S\,!D\, V # Match if body doesn't match -- don't care about header\, size\, or date # #* Indicates early match candidate based on the header only #% Indicates early nonmatch candidate based on the header only

require 5.00396;

use vars qw(%opts $pattern $commandLine $VERSION $DEBUG %message_ids_seen );

use Getopt​::Std; use Date​::Parse;

use strict; use FileHandle; use Carp;

#-------------------------------------------------------------------------------

# Outputs debug messages with the -D flag. Be sure to return 1 so code like # 'dprint "blah\n" and exit' works.

sub dprint {   return 1 unless $DEBUG;

  my $message = join ''\,@​_;

  my @​lines = split /\n/\, $message;   foreach my $line (@​lines)   {   print "DEBUG​: $line\n";   }

  return 1; }

#-------------------------------------------------------------------------------

# Perform a clean exit with CTRL-C is caught\, a pipe is empty\, a pipe is # killed\, etc.

sub cleanExit {   my $message;

  $message = shift || "Cancelled";   print STDERR "grepmail​: $message.\n";

  exit 1; }

#-------------------------------------------------------------------------------

# Print usage error if no arguments given print "No arguments given. grepmail --help for help.\n" and exit if (!@​ARGV);

# Check for --help\, the standard usage command print usage() and exit if $ARGV[0] eq '--help';

# Save the arguments because getopt changes them. my @​args = @​ARGV;

# So we don't have to test whether they are defined later. $opts{'D'} = $opts{'d'} = $opts{'e'} = $opts{'i'} = $opts{'q'} = 0; $opts{'h'} = $opts{'b'} = $opts{'v'} = $opts{'l'} = $opts{'r'} = 0; $opts{'M'} = $opts{'m'} = $opts{'s'} = $opts{'u'} = 0;

# Initialize seen messages data structure to empty. %message_ids_seen = ();

getopt("eds"\,\%opts);

$DEBUG = $opts{'D'} || 0;

# Need to quote arguments with spaces grep { $_ = "'$_'" if $_ =~ / /; $_ } @​args;

# Save the command line for later when we try to decompress standard input $commandLine = "$0 @​args";

dprint "Command line was (special characters not escaped)​:"; dprint " $commandLine";

{   dprint "Date​::Parse VERSION​: $Date​::Parse​::VERSION";   # To prevent warning about variable being used only once;   my $dummy = $Date​::Parse​::VERSION; }

# Here we have to deal with the possibility that the user specified the search # pattern without the -e flag.

# getopts stops as soon as it sees a non-flag\, so $ARGV[0] may contain the # pattern with more flags after it. if (!$opts{'e'}) {   my $missing_flags = '';

  foreach my $flag (keys %opts)   {   $missing_flags .= $flag unless $opts{$flag};   }

  $missing_flags = "[$missing_flags]";

  # If it looks like more flags are following\, then grab the pattern and   # process them.   if ( $#ARGV > 0 && $ARGV[1] =~ /^-$missing_flags$/ )   {   $pattern = shift @​ARGV;   getopt("ds"\,\%opts);   }   # If we've seen a -d\, -s\, or -u flag\, and it doesn't look like there are   # flags following $ARGV[0]\, then look at the value in $ARGV[0]   elsif ( ( $opts{'d'} || $opts{'s'} || $opts{'u'} ) &&   ( $#ARGV \<= 0 ||   ( $#ARGV > 0 && $ARGV[1] !~ /^-$missing_flags$/ )   )   )   {   # If $ARGV[0] looks like a file we assume there was no pattern and   # set a default pattern of "." to match everything.   if ($#ARGV != -1 && -f $ARGV[0])   {   $pattern = ".";   }   # Otherwise we take the pattern and move on   else   {   $pattern = shift @​ARGV;   }   }   # If we still don't have a pattern or any -d\, -s\, or -u flag\, we assume   # that $ARGV[0] is the pattern   elsif (!$opts{'d'} && !$opts{'s'} && !$opts{'u'})   {   $pattern = shift @​ARGV;   } }

if ($DEBUG) {   dprint "Options are​:";   foreach my $i (sort keys %opts)   {   dprint " $i​: $opts{$i}";   }

  dprint "INC is​:";   foreach my $i (@​INC)   {   dprint " $i";   } }

if ($opts{'e'}) {   print "You specified two search patterns.\n" and exit if defined $pattern;   $pattern = $opts{'e'}; } elsif (!defined $pattern) {   # The only times you don't have to specify the pattern is when -d\, -s\, or -u   # is being used. This should catch people who do "grepmail -h" thinking   # it's help.   print usage() and exit unless $opts{'d'} || $opts{'s'} || $opts{'u'};

  $pattern = "."; }

################################ MAIN PROGRAM #################################

sub ProcessDate($); sub GetFiles(@​);

# Make the pattern insensitive if we need to $pattern = "(?i)$pattern" if ($opts{'i'});

my ($dateRestriction\, $date1\, $date2);

if ($opts{'d'}) {   ($dateRestriction\,$date1\,$date2) = ProcessDate($opts{'d'}); } else {   $dateRestriction = "none"; }

dprint "PATTERN​: $pattern\n"; dprint "FILES​: @​ARGV\n"; dprint "DATE RESTRICTION​: $dateRestriction\n"; dprint "SIZE RESTRICTION​: $opts{'s'}\n";

# Catch everything I can... We have to localize these to prevent odd bugs from # cropping up (see changelog). local $SIG{PIPE} = \&cleanExit; local $SIG{HUP} = \&cleanExit; local $SIG{INT} = \&cleanExit; local $SIG{QUIT} = \&cleanExit; local $SIG{TERM} = \&cleanExit;

my @​files = GetFiles(@​ARGV);

# If the user provided input files... if (@​files) {   HandleInputFiles(@​files); } # Using STDIN else {   HandleStandardInput(); }

#-------------------------------------------------------------------------------

# Get a list of files\, taking recursion into account if necessary.

sub GetFiles(@​) {   my @​args = @​_;

  # We just return what we were given unless we need to recurse subdirectories.   return @​args unless defined $opts{'R'};

  my @​files;

  foreach my $arg (@​args)   {   if (-f $arg)   {   push @​files\, $arg;   }   elsif( -d $arg)   {   dprint "Recursing directory $arg looking for files...";

  unless (eval "require File​::Find;")   {   print "You specified -R\, but do not have File​::Find. ".   "Get it from CPAN.\n";   exit;   }

  import File​::Find;

  # Gets all plain files in directory and descendents. Puts them in @​files   $File​::Find​::name = '';   find(sub {push @​files\,"$File​::Find​::name" if -f $_}\, $arg);   }   else   {   # Ignore unknown file types   }   }

  return @​files; }

#-------------------------------------------------------------------------------

sub HandleInputFiles {   my @​files = @​_;

  # For each input file...   foreach my $file (@​files)   {   dprint '#'x70;   dprint "Processing file $file";

  # First of all\, silently ignore empty files...   next if -z $file;

  # ...and also ignore directories.   if (-d $file)   {   warn "** Skipping directory​: '$file' **\n" unless $opts{'q'};   next;   }

  my $fileHandle = new FileHandle;   my ($filter\,$filterError);

  # If it's not a compressed file   if ($file !~ /\.(gz|Z|bz2|tz)$/)   {   if (-B $file)   {   warn "** Skipping binary file​: '$file' **\n" unless $opts{'q'};   next;   }

  unless ($fileHandle->open($file))   {   warn "** Can't open $file​: $!\, skipping **\n" unless $opts{'q'};   next;   }   }   # If it is a tzipped file   elsif ($file =~ /\.tz$/)   {   dprint "Calling tzip to decompress file.";

  $filter = 'tzip';

  use vars qw(*OLDSTDERR);   open OLDSTDERR\,">&STDERR" or die "Can't save STDERR​: $!\n";   open STDERR\,">/dev/null"   or die "Can't redirect STDERR to /dev/null​: $!\n";

  unless ($fileHandle->open("tzip -cd '$file'|"))   {   $filterError = $!;   }

  open STDERR\,">&OLDSTDERR" or die "Can't restore STDERR​: $!\n";   }   # If it is a gzipped file   elsif ($file =~ /\.(gz|Z)$/)   {   dprint "Calling gunzip to decompress file.";

  $filter = 'gunzip';

  use vars qw(*OLDSTDERR);   open OLDSTDERR\,">&STDERR" or die "Can't save STDERR​: $!\n";   open STDERR\,">/dev/null"   or die "Can't redirect STDERR to /dev/null​: $!\n";

  unless ($fileHandle->open("gunzip -c '$file'|"))   {   $filterError = $!;   }

  open STDERR\,">&OLDSTDERR" or die "Can't restore STDERR​: $!\n";   }   # If it is a bzipped file   elsif ($file =~ /\.bz2$/)   {   dprint "Calling bzip2 to decompress file.";

  $filter = 'bzip2';

  use vars qw(*OLDSTDERR);   open OLDSTDERR\,">&STDERR" or die "Can't save STDERR​: $!\n";   open STDERR\,">/dev/null"   or die "Can't redirect STDERR to /dev/null​: $!\n";

  unless ($fileHandle->open("bzip2 -dc '$file'|"))   {   $filterError = $!;   }

  open STDERR\,">&OLDSTDERR" or die "Can't restore STDERR​: $!\n";   }

  if ($filterError)   {   warn "** Can't execute \"$filter\" for file \"$file\"​: $filterError\, ".   "skipping **\n" unless $opts{'q'};   next;   }

  unless (DataOnFileHandle($fileHandle))   {   unless ($fileHandle->close())   {   warn "** Can't execute \"$filter\" for file \"$file\"​: ".   "skipping **\n" unless $opts{'q'};   }   next;   }

  if (!IsMailbox($fileHandle))   {   warn "** Skipping non-mailbox ASCII file​: '$file' **\n" unless $opts{'q'};   next;   }

  ProcessMailFile($fileHandle\,$file);

  $fileHandle->close();   } }

#-------------------------------------------------------------------------------

sub HandleStandardInput {   dprint "Handling STDIN";

  # We have to implement our own -B and -s\, because STDIN gets eaten by them   binmode STDIN;

  my ($testChars\,$isEmpty\,$isBinary);

  my $fileHandle = new FileHandle;   $fileHandle->open('-');

  $isEmpty = 0;   $isBinary = 0;

  my $readResult = read($fileHandle\,$testChars\,200);

  cleanExit "Can't read from standard input" unless defined $readResult;

  $isEmpty = 1 if $readResult == 0;

  cleanExit "No data on standard input" if $isEmpty;

  # This isn't the "real" way to do -B\, but it should work okay.   $isBinary = 1 if !$isEmpty &&   ($testChars =~ /\000/ || $testChars =~ /[\200-\377]/);

  PutBackString($fileHandle\,$testChars);

  # If it looks binary and is non-empty\, try to uncompress it. Here we're   # calling another copy of grepmail through the open command.   if ($isBinary)   {   my $filter;

  # This seems to work. I'm not sure what the "proper" way to distinguish   # between gzip'd and bzip2'd and tzip'd files is.   if ($testChars =~ /^TZ/)   {   dprint "Trying to decompress using tzip.";   $filter = "tzip -dc";   }   elsif ($testChars =~ /^BZ/)   {   dprint "Trying to decompress using bzip2.";   $filter = "bzip2 -d";   }   else   {   dprint "Trying to decompress using gunzip.";   $filter = "gunzip -c";   }

  # Here we invoke another copy of grepmail with a filter in front.   use vars qw(*OLDSTDERR);   open OLDSTDERR\,">&STDERR" or die "Can't save STDERR​: $!\n";   open STDERR\,">/dev/null"   or die "Can't redirect STDERR to /dev/null​: $!\n";

  my $newGrepmail = new FileHandle;   $newGrepmail->open("|$filter|$commandLine")   or warn "** Can't execute \"$filter\" on STDIN​: $! **\n"   unless $opts{'q'};

  open STDERR\,">&OLDSTDERR" or die "Can't restore STDERR​: $!\n";

  while (!eof $fileHandle)   {   my $temp = \<$fileHandle>;   print $newGrepmail $temp;   }

  $newGrepmail->close()   or warn "** Can't execute \"$filter\" on STDIN​: $! **\n"   unless $opts{'q'};   }   # Otherwise process it directly   else   {   if (!IsMailbox($fileHandle))   {   warn "** Skipping non-mailbox standard input **\n" unless $opts{'q'};   return;   }

  ProcessMailFile($fileHandle\,"Standard input");   } }

#-------------------------------------------------------------------------------

# Checks to see if there is data on a filehandle\, without reading that data.

sub DataOnFileHandle {   my $fileHandle = shift;

  my $buffer = \<$fileHandle>;

  return 0 unless defined $buffer;

  PutBackString($fileHandle\,$buffer);

  return $buffer ? 1 : 0; }

#-------------------------------------------------------------------------------

# Puts a string back on a file handle

sub PutBackString {   my $fileHandle = shift;   my $string = shift;

  while (defined $string && $string ne '')   {   my $char = chop $string;   $fileHandle->ungetc(ord($char));   } }

#-------------------------------------------------------------------------------

# Detects whether an ASCII file is a mailbox\, based on whether it has a 'From' # and 'Date​:' prefix on two lines of the first paragraph.

sub IsMailbox {   my $fileHandle = shift @​_;

  # Read whole paragraphs   local $/ = "\n\n";

  # Read a paragraph to get the header.   my $buffer = \<$fileHandle>;

  my $returnVal;

  # X-From-Line is used by Gnus\, and From is used by normal Unix format   if ($buffer =~ /^(X-From-Line​:|From) /im && $buffer =~ /^Date​: /im)   {   $returnVal = 1;   }   else   {   $returnVal = 0;   }

  PutBackString($fileHandle\,$buffer);

  return $returnVal; }

#-------------------------------------------------------------------------------

sub ProcessMailFile ($$) {   my $fileHandle = shift @​_;   my $fileName = shift @​_;

  # $header_buffer stores the header for the current email. $body_buffer   # stores the body for the current email. $next_header stores the header for   # the next email\, in case we encounter it while looking for the end of the   # current email.

  # I'd really like to call PushBackString instead of storing $next_header\,   # but that way is much slower\, and fails unpredictably. :( The next best   # solution is to write a wrapper class for FileHandle that allows pushbacks\,   # and stores them in an internal buffer. Too much work for now though...

  my ($numberOfMatches\,$header_buffer\,$body_buffer\,$next_header);

  $next_header = undef;   $numberOfMatches = 0;

  # Read whole paragraphs   local $/ = "\n\n";

  # This is the main loop. It's executed once for each email   while (!eof($fileHandle))   {   $header_buffer = '';   $body_buffer = '';

  if (!defined $next_header)   {   dprint "Getting header for first email.";

  $header_buffer = \<$fileHandle>;   }   else   {   dprint "Processing buffered header.";   $header_buffer = $next_header;

  undef $next_header;   }

  PrintEmailStatistics($header_buffer) if $DEBUG;

  #----------------------------------------------------------------

  dprint "Checking for early match or abort based on header information.";   my ($result\,$matchesHeader) =   AnalyzeHeader(\$header_buffer\,$fileHandle\,\$next_header\,$pattern);

  if ($result eq 'skip')   {   dprint "Doing an early abort based on header.";   SkipToNextEmail($fileHandle\,\$next_header\,$fileName);   next;   }   elsif ($result eq 'print')   {   dprint "Doing an early printout based on header.";  
  if ($opts{'l'})   {   print "$fileName\n";  
  # We can return since we found at least one email that matches.   return 'done';   }   elsif ($opts{'r'})   {   $numberOfMatches++;   SkipToNextEmail($fileHandle\,\$next_header\,$fileName);   }   else   {   GetRestOfBody($fileHandle\,\$body_buffer\,\$next_header\,$fileName);

  PrintEmail($fileName\,$header_buffer\,$body_buffer)   if $opts{'u'} && NotADuplicate($header_buffer) || !$opts{'u'};

  next;   }   }

  dprint "Couldn't do an early printout or abort based on header.";

  #----------------------------------------------------------------

  dprint "Reading body.";

  GetRestOfBody($fileHandle\,\$body_buffer\,\$next_header\,$fileName);

  my $matchesBody;

  # Ignore the MIME attachments if -M was specified   if ($opts{'M'} &&   (($header_buffer =~ /\nContent-Type​:.*?boundary="([^"]*)"/is) ||   ($header_buffer =~ /\nContent-Type​:.*?boundary=([^\n]*)/is)))   {   my $boundary = $1;

  my $tempBody = $body_buffer;

  # Strip out any attachments that aren't textual   $tempBody =~   s/\Q$boundary\E\nContent-Type​: (?!text).*?(?=\Q$boundary\E)//igs;

  $matchesBody = ($tempBody =~ /$pattern/om) || 0;   }   else   {   $matchesBody = ($body_buffer =~ /$pattern/om) || 0;   }

  #----------------------------------------------------------------

  dprint "Total email size is​: "\,length ($header_buffer.$body_buffer)\,"\n";   my $matchesSize = length $header_buffer.$body_buffer \< $opts{'s'} ? 1 : 0;

  dprint "Checking for early match or abort based on header+body information.";

  my $isMatch = 1;  
  $isMatch = 0 if $opts{'s'} && !$matchesSize;   $isMatch = 0 if $opts{'b'} && !$matchesBody;   $isMatch = 0 if $opts{'h'} && !$matchesHeader;   $isMatch = 0 if !$opts{'b'} && !$opts{'h'} && !($matchesBody || $matchesHeader);

  if ($isMatch == 0 && !$opts{'v'})   {   dprint "Doing an early abort based on header+body.";   SkipToNextEmail($fileHandle\,\$next_header\,$fileName);   next;   }   elsif ($isMatch == 0 && $opts{'v'})   {   dprint "Doing an early printout based on header+body.";  
  if ($opts{'l'})   {   print "$fileName\n";  
  # We can return since we found at least one email that matches.   return 'done';   }   elsif ($opts{'r'})   {   $numberOfMatches++;   SkipToNextEmail($fileHandle\,\$next_header\,$fileName);   }   else   {   GetRestOfBody($fileHandle\,\$body_buffer\,\$next_header\,$fileName);

  PrintEmail($fileName\,$header_buffer\,$body_buffer)   if $opts{'u'} && NotADuplicate($header_buffer) || !$opts{'u'};

  next;   }   }

  dprint "Couldn't do an early printout or abort based on header+body.";

  $isMatch = 1;  
  {   my $matchesDate = CheckDate(\$header_buffer);   $isMatch = 0 if $opts{'d'} && !$matchesDate;

  dprint "Email matches date constraint\n"   if $opts{'d'} && $matchesDate;   dprint "Email doesn't match date constraint\n"   if $opts{'d'} && !$matchesDate;   }

  $isMatch = !$isMatch if $opts{'v'};

  # If the match occurred in the right place...   if ($isMatch)   {   dprint "Email matches all patterns and constraints.";

  if ($opts{'l'})   {   print "$fileName\n";

  # We can return since we found at least one email that matches.   return 'done';   }   elsif ($opts{'r'})   {   $numberOfMatches++;   }   else   {   PrintEmail($fileName\,$header_buffer\,$body_buffer)   if $opts{'u'} && NotADuplicate($header_buffer) || !$opts{'u'};   }   }   else   {   dprint "Email did not match all patterns and constraints.";   }   }

  print "$fileName​: $numberOfMatches\n" if ($opts{'r'}); }

#-------------------------------------------------------------------------------

# Checks that an email is not a duplicate of one already printed. This should # only be called when $opts{'u'} is true. Also\, as a side-effect\, it updates # the %message_ids_seen when it sees an email that hasn't been printed yet.

sub NotADuplicate {   my $header_buffer = shift;

  my $message_id = $header_buffer;   $message_id =~ m/^Message-Id​:\<.*>$/om;

  my $result;

  if ($message_ids_seen{$message_id})   {   $result = 0;   dprint "Found duplicate $message_id";   }   else   {   $result = 1;   dprint "Found non-duplicate $message_id";   }

  $message_ids_seen{$message_id} = 1;

  return $result; }

#-------------------------------------------------------------------------------

# Print the email author and subject.

sub PrintEmailStatistics {   my $header = shift;

  dprint '-'x70;   dprint "Processing email​:";

  if ($header =~ /^(From​:.*)$/im)   {   dprint " $1";   }   elsif ($header =~ /^(From.*)$/im)   {   dprint " $1";   }   else   {   dprint " [No from line found]";   }

# Stop on warning. local $SIG{__WARN__} = sub { print $_[0];exit };

  print "a\n";   if ($header =~ /^(Subject​:.*)$/im)   {   print "b\n";   dprint " $1";   }   else   {   dprint " [No subject line found]";   } }

#-------------------------------------------------------------------------------

# Returns​: # A result​: # - 'print' if the email is a match and we need to print it # - 'skip' if we should skip the current email and go on to the next one # - 'continue' if we need to keep processing the email. # A boolean for whether the header matches the pattern. # A boolean for whether the header has the correct date.

sub AnalyzeHeader(\$$\$$) {   my $header_buffer = ${shift @​_};   my $fileHandle = shift;   my $next_header = ${shift @​_};   my $pattern = shift;

  # See if the header matches the pattern   my $matchesHeader = ($header_buffer =~ /$pattern/om) || 0;

  # See if the email failed the size restriction. If it passes here\, we still   # don't know if the entire length of the email is too long.   my $matchesSize = length $header_buffer \< $opts{'s'} ? 1 : 0;

  # At this point\, we might know enough to print the email\, or call for an   # early abort. See the documentation at the top for information. These   # conditions have been simplified to save space at the cost of clarity.   # Also\, all date related conditions have been taken out\, and will be handled   # after the pattern is searched for in the body of the email. This is   # because searching for the pattern is a lot faster than handling the date\,   # and we don't have to handle the date if the pattern isn't found.  
  # First handle the situations where a date constraint isn't a factor.   return ('print'\,1) if (   (!$opts{'b'} && $opts{'h'} && !$opts{'s'} && !$opts{'d'} &&   !$opts{'v'} && $matchesHeader) ||   ( $opts{'h'} && $opts{'s'} && !$opts{'d'} &&   $opts{'v'} && (!$matchesHeader || !$matchesSize)) ||   ( !$opts{'h'} && $opts{'s'} && !$opts{'d'} &&   $opts{'v'} && (!$matchesSize)) ||   ( $opts{'h'} && !$opts{'s'} && !$opts{'d'} &&   $opts{'v'} && (!$matchesHeader)) ||

  ( $opts{'h'} && $opts{'s'} && $opts{'d'} &&   $opts{'v'} && (!$matchesHeader || !$matchesSize)) ||   ( !$opts{'h'} && $opts{'s'} && $opts{'d'} &&   $opts{'v'} && (!$matchesSize)) ||   ( $opts{'h'} && !$opts{'s'} && $opts{'d'} &&   $opts{'v'} && (!$matchesHeader))   );

  return ('skip'\,0) if (   ( $opts{'h'} && $opts{'s'} && !$opts{'d'} &&   !$opts{'v'} && (!$matchesHeader || !$matchesSize)) ||   ( !$opts{'h'} && $opts{'s'} && !$opts{'d'} &&   !$opts{'v'} && (!$matchesSize)) ||   ( $opts{'h'} && !$opts{'s'} && !$opts{'d'} &&   !$opts{'v'} && (!$matchesHeader)) ||   (!$opts{'b'} && $opts{'h'} && !$opts{'s'} && !$opts{'d'} &&   $opts{'v'} && ($matchesHeader)) ||

  ( $opts{'h'} && $opts{'s'} && $opts{'d'} &&   !$opts{'v'} && (!$matchesHeader || !$matchesSize)) ||   ( !$opts{'h'} && $opts{'s'} && $opts{'d'} &&   !$opts{'v'} && (!$matchesSize)) ||   ( $opts{'h'} && !$opts{'s'} && $opts{'d'} &&   !$opts{'v'} && (!$matchesHeader))   );

  return ('continue'\,$matchesHeader); }

#-------------------------------------------------------------------------------

sub SkipToNextEmail($\$$) {   my $fileHandle = shift;   my $next_header = shift;   my $fileName = shift;   my $paragraph;

  dprint "Skipping to next email.";

  # If we have something buffered\, it's the beginning of the next email   # address\, so we don't need to do anything. Joy.   return if defined $$next_header;

  do   {   $paragraph = \<$fileHandle>;   }   while (!eof($fileHandle) && ($paragraph !~ /^\n?From .*\d​:\d+​:\d.* \d{4}/i)) ;

  # Buffer if we went too far. Zap the starting newline while we're at it.   ($$next_header) = $paragraph =~ /^\n?(.*)/s if (!eof($fileHandle)); }

#-------------------------------------------------------------------------------

sub GetRestOfBody($\$\$$) {   my $fileHandle = shift;   my $body_buffer = shift;   my $next_header = shift;   my $fileName = shift;

  return if defined $$next_header;

  # Read the entire email body into the buffer   my $doneLooking = 0;   do   {   my $paragraph = \<$fileHandle>;

  if (defined $paragraph)   {   if ($paragraph =~ /^(\n?)(From .*\d​:\d+​:\d.* \d{4}.*)/is)   {   dprint "Found next email's header\, buffering.";   $$body_buffer .= $1;   $$next_header = $2;   $doneLooking = 1;   }   else   {   $$body_buffer .= $paragraph;   }   }

  if (eof($fileHandle))   {   dprint "Found EOF.";   $doneLooking = 1;   }   }   while (!$doneLooking); }

#-------------------------------------------------------------------------------

sub PrintEmail($$$\$) {   my $fileName = shift;   my $header = shift;   my $body = shift;

  dprint "Printing email.";

  # Add the mailfolder to the headers if -m was given   if ($opts{'m'})   {   $header =~ s/\n+$/\n/s;   $header .= "X-Mailfolder​: $fileName\n\n";   }   print $header;

  # Print whatever body we've read already.   print $body; }

#-------------------------------------------------------------------------------

sub CheckDate($) {   my $header = ${shift @​_};   my ($emailDate\, $isInDate);   $emailDate = "";   $isInDate = 0;

  # RFC 822 allows header lines to be continued on the next line\, in which case   # they must be preceded by whitespace. Let's remove the continuations.   $header =~ s/\n\s+/ /gs;

  if ($opts{'d'})   {   # The email might not have a date. In this case\, print it out anyway.   if ($header =~ /^Date​:\s*(.*)$/im)   {   dprint "Date in email is​: $1.";

  $emailDate = str2time($1);

  if (defined $emailDate)   {   $isInDate = IsInDate($emailDate\,$dateRestriction\,$date1\,$date2);   }   else   {   warn "** Couldn't parse date \"$1\". Assuming it doesn't match the " .   "date constraint **\n" if $opts{'d'};   $isInDate = 0;   }   }   else   {   dprint "No date found in email.";

  $isInDate = 1;   }   }   else   {   $isInDate = 1;   }

  return $isInDate; }

#-------------------------------------------------------------------------------

# This function tries to parse a date first with Date​::Parse. If Date​::Parse # can't parse the date\, then the function tries to use Date​::Manip to parse # it. Returns '' if the date can't be parsed.

{ my $loaded_date_manip = undef;

sub ParseDate {   my $date = shift;

  my $parsedDate;

  # First try to parse the date with Date​::Parse;   $parsedDate = str2time($date);   return $parsedDate if defined $parsedDate;

  # Try to load Date​::Manip if we haven't already   unless (defined $loaded_date_manip)   {   if (eval "require Date​::Manip")   {   $loaded_date_manip = 1;

  dprint "Date​::Manip VERSION​: $Date​::Manip​::VERSION";   # To prevent warning about variable being used only once   my $dummy = $Date​::Manip​::VERSION;   }   else   {   $loaded_date_manip = 0;   }   }

  return '' unless $loaded_date_manip;

  my $temp_date = Date​::Manip​::UnixDate(Date​::Manip​::ParseDate($date)\,'%s');

  return '' unless defined $temp_date;   return $temp_date; } }

#-------------------------------------------------------------------------------

# Figure out what kind of date restriction they want\, and what the dates in # question are. sub ProcessDate($) {   my $datestring = shift;

  if(!defined($datestring))   {   return ("none"\,""\,"");   }

  if ($datestring =~ /^before (.*)/i)   {   $dateRestriction = "before";   $date1 = ParseDate($1);   $date2 = "";

  cleanExit "\"$1\" is not a valid date" if (!$date1);   }   elsif ($datestring =~ /^(after |since )(.*)/i)   {   $dateRestriction = "after";   $date1 = ParseDate($2);   $date2 = "";

  cleanExit "\"$2\" is not a valid date" if (!$date1);   }   elsif ($datestring =~ /^between (.*) and (.*)/i)   {   $dateRestriction = "between";   $date1 = ParseDate($1);   $date2 = ParseDate($2);

  cleanExit "\"$1\" is not a valid date" if (!$date1);   cleanExit "\"$2\" is not a valid date" if (!$date2);

  # Swap the dates if the user gave them backwards.   if ($date1 > $date2)   {   my $temp;   $temp = $date1;   $date1 = $date2;   $date2 = $temp;   }

  }   elsif (ParseDate($datestring) ne '')   {   $dateRestriction = "on";   $date1 = ParseDate($datestring);   }   else   {   cleanExit "Invalid date specification. Use \"$0 -h\" for help";   }

  return ($dateRestriction\,$date1\,$date2); }

#-------------------------------------------------------------------------------

sub IsInDate($$$$) {   my ($emailDate\,$dateRestriction\,$date1\,$date2);   $emailDate = shift @​_;   $dateRestriction = shift @​_;   $date1 = shift @​_;   $date2 = shift @​_;

  # Here we do the date checking.   if ($dateRestriction eq "none")   {   return 1;   }   else   {   if ($dateRestriction eq "before")   {   if ($emailDate \< $date1)   {   return 1;   }   else   {   return 0;   }   }   elsif ($dateRestriction eq "after")   {   if ($emailDate > $date1)   {   return 1;   }   else   {   return 0;   }   }   elsif ($dateRestriction eq "on")   {   # Since these values are in seconds\, we have to make sure that $emailDate   # is within 24 hours after $date1   if (($emailDate > $date1) && ($emailDate-$date1 \< 24*60*60))   {   return 1;   }   else   {   return 0;   }   }   elsif ($dateRestriction eq "between")   {   if (($emailDate > $date1) && ($emailDate \< $date2))   {   return 1;   }   else   {   return 0;   }   }   } }

#-------------------------------------------------------------------------------

sub usage { \<\<EOF; grepmail $VERSION

usage​: grepmail [-bDhilmrRuv] [-s size] [-d "datespec"] [[-e] \] \<files...>

At least one of -s\, -d\, -u\, and -e must be specified\, and can appear in any relative order following the other flags. The -e flag is optional if expr appears immediately before -s or -d. Files can be plain ASCII or ASCII files compressed with gzip\, tzip\, or bzip2. If no file is provided\, normal or compressed ASCII input is taken from STDIN.

-b Search must match body -d Specify a date range (see below) -D Debug mode -e Explicitely name expr (when searching for strings beginning with "-") -h Search must match header -i Ignore case in the search expression -l Output the names of files having an email matching the expression -M Do not search non-text mime attachments -m Append "X-Mailfolder​: \" to all headers to indicate in which folder   the match occurred -q Quiet mode -- don't output warnings -r Output the names of the files and the number of emails matching the   expression -R Recurse directories -s Restrict results to emails less than a certain size (in bytes) -u Ensure that no duplicate emails are output -v Output emails that don't match the expression

Date specifications must be of the form of​: a date like "today"\, "1st thursday in June 1992" (requires Date​::Manip)\,   "05/18/93"\, "12​:30 Dec 12th 1880"\, "8​:00pm december tenth"\, OR "before"\, "after"\, or "since"\, followed by a date as defined above\, OR "between \ and \"\, where \ is defined as above. EOF }

#-------------------------------------------------------------------------------

=head1 NAME

grepmail - search mailboxes for mail matching a regular expression

=head1 SYNOPSIS

  grepmail [-bDhilmrRuv] [-s size] [-d "datespec"] [[-e] \] \<files...>

=head1 DESCRIPTION

=over 2

I\ looks for mail messages containing a pattern\, and prints the resulting messages on standard out.

By default I\ looks in both header and body for the specified pattern.

When redirected to a file\, the result is another mailbox\, which can\, in turn\, be handled by standard User Agents\, such as I\\, or even used as input for another instance of I\.

At least one of B\<-e>\, B\<-d>\, B\<-s>\, or B\<-u> must be specified. The pattern is optional if B\<-d>\, B\<-s>\, and/or B\<-u> is used. The B\<-e> flag is optional if there is no file whose name is the pattern.

=back

=head1 OPTIONS AND ARGUMENTS

Many of the options and arguments are analogous to those of grep.

=over 8

=item B\

The pattern to search for in the mail message. May be any Perl regular expression\, but should be quoted on the command line to protect against globbing (shell expansion). To search for more than one pattern\, use the form "(pattern1|pattern2|...)".

=item B\

Mailboxes must be traditional\, UNIX C\</bin/mail> mailbox format. The mailboxes may be compressed by gzip\, tzip\, or bzip2\, in which case gunzip\, tzip\, or bzip2 must be installed on the system.

If no mailbox is specified\, takes input from stdin\, which can be compressed or not. grepmail's behavior is undefined when ASCII and binary data is piped together as input.

=item B\<-b>

Asserts that the pattern must match in the body of the email.

=item B\<-D>

Enable debug mode\, which prints diagnostic messages.

=item B\<-d>

Date specifications must be of the form of​:   - a date like "today"\, "yesterday"\, "5/18/93"\, "5 days ago"\, "5 weeks ago"\,   - OR "before"\, "after"\, or "since"\, followed by a date as defined above\,   - OR "between \ and \"\, where \ is defined as above.

Simple date expressions will first be parsed by Date​::Parse. If this fails\, grepmail will attempt to parse the date with Date​::Manip\, if the module is installed on the system.

=item B\<-e>

Explicitely specify the search pattern. This is useful for specifying patterns that begin with "-"\, which would otherwise be interpreted as a flag.

=item B\<-h>

Asserts that the pattern must match in the header of the email.

=item B\<-i>

Make the search case-insensitive (by analogy to I\<grep -i>).

=item B\<-l>

Output the names of files having an email matching the expression\, (by analogy to I\<grep -l>).

=item B\<-M>

Causes grepmail to ignore non-text MIME attachments. This removes false positives resulting from binaries encoded as ASCII attachments.

=item B\<-m>

Append "X-Mailfolder​: \" to all email headers\, indicating which folder contained the matched email.

=item B\<-q>

Quiet mode. Suppress the output of warning messages about non-mailbox files\, directories\, etc.

=item B\<-r>

Generate a report of the names of the files containing emails matching the expression\, along with a count of the number of matching emails.

=item B\<-R>

Causes grepmail to recurse any directories encountered.

=item B\<-s>

Return emails smaller than the size (in bytes) specified with this flag.

=item B\<-u>

Output only unique emails\, by analogy to I\<sort -u>. Grepmail determines email uniqueness by the Message-ID header.

=item B\<-v>

Invert the sense of the search\, by analogy to I\<grep -v>. This results in the set of emails printed being the complement of those that would be printed without the B\<-v> switch.

=back

=head1 EXAMPLES

Count the number of emails. ("." matches every email.)

  grepmail -r . sent-mail

Get all email larger than 2000 bytes about books

  grepmail books -s 2000 sent-mail

Get all email that you mailed yesterday

  grepmail -d yesterday sent-mail

Get all email that you mailed before the first thursday in June 1998 that pertains to research (requires Date​::Manip)​:

  grepmail research -d "before 1st thursday in June 1992" sent-mail

Get all email that you mailed before the first of June 1998 that pertains to research​:

  grepmail research -d "before 6/1/92" sent-mail

Get all email you received since 8/20/98 that wasn't about research or your job\, ignoring case​:

  grepmail -iv "(research|job)" -d "since 8/20/98" saved-mail

Get all email about mime but not about Netscape. Constrain the search to match the body\, since most headers contain the text "mime"​:

  grepmail -b mime saved-mail | grepmail Netscape -v

Print a list of all mailboxes containing a message from Rodney. Constrain the search to the headers\, since quoted emails may match the pattern​:

  grepmail -hl "^From.*Rodney" saved-mail*

Find all emails with the text "Pilot" in both the header and the body​:

  grepmail -hb "Pilot" saved-mail*

Print a count of the number of messages about grepmail in all saved-mail mailboxes​:

  grepmail -br grepmail saved-mail*

Remove any duplicates from a mailbox​:

  grepmail -u saved-mail

=head1 FILES

grepmail will I\ create temporary files while decompressing compressed archives. The last version to do this was 3.5. While the new design uses more memory\, the code is much simpler\, and there is less chance that email can be read by malicious third parties. Memory usage is determined by the size of the largest email message in the mailbox.

=head1 BUGS

=over 8

=item Test case 1 fails on some platforms

Bug not squashed yet. Any info would be appreciated.

=item File names with special characters cause problems.

grepmail uses the shell to invoke the decompression filters. If the filename contains single quotes\, ampersands\, backslashes\, etc\, this can cause problems. This bug is not a high priority -- please send email if you really need it fixed.

The fix is to fork the process\, opening up a pipe to the child\, which starts the filter process using the multiple argument form of exec (which doesn't invoke the shell).

=item File names that look like flags cause problems.

In some special circumstances\, grepmail will be confused by files whose names look like flags. In such cases\, use the B\<-e> flag to specify the search pattern.

=back

=head1 AUTHOR

  David Coppit\, \david@&#8203;coppit\.org\, http​://coppit.org/

=head1 SEE ALSO

elm(1)\, mail(1)\, grep(1)\, perl(1)\, printmail(1)\, Mail​::Internet(3) Crocker\, D. H.\, Standard for the Format of Arpa Internet Text Messages\, RFC822.

=cut

p5pRT commented 24 years ago

From david@coppit.org

From dwc3q@​mamba.cs.Virginia.EDU Tue Jun 1 15​:14​:37 1999 -0400 Received​: from cms1.mail.virginia.edu (cms1.mail.Virginia.EDU [128.143.2.21])   by ares.cs.Virginia.EDU (8.9.2/8.9.2/UVACS-1999030200) with ESMTP id PAA10414   for \ftree@&#8203;cs\.Virginia\.EDU; Tue\, 1 Jun 1999 15​:14​:37 -0400 (EDT) Received​: from localhost (cdc6d@​localhost)   by cms1.mail.virginia.edu (8.8.7/8.8.7) with ESMTP id PAA10414   for \ftree@&#8203;cs\.virginia\.edu; Tue\, 1 Jun 1999 15​:14​:36 -0400 (EDT) Date​: Tue\, 1 Jun 1999 15​:14​:36 -0400 (EDT) From​: XXX \XXX@&#8203;XXX\.com X-Sender​: cdc6d@​cms1.mail.virginia.edu To​: YYY@​YYY.com Subject​: YYY1 Message-ID​: \Pine\.GSO\.4\.05\.9906011457100\.3506\-100000@&#8203;cms1\.mail\.virginia\.edu MIME-Version​: 1.0 Content-Type​: TEXT/PLAIN; charset=US-ASCII Status​: RO

body1

From dwc3q@​mamba.cs.Virginia.EDU Tue Jun 1 16​:36​:39 1999 -0400 Received​: from nak.dreamhost.com (root@​[207.155.127.175])   by ares.cs.Virginia.EDU (8.9.2/8.9.2/UVACS-1999030200) with ESMTP id QAA12615   for \dwc3q@&#8203;cs\.virginia\.edu; Tue\, 1 Jun 1999 16​:36​:38 -0400 (EDT) Received​: from mail.binaryresearch.net ([207.250.187.200])   by nak.dreamhost.com (8.9.3/8.9.3/Debian/GNU) with ESMTP id NAA13058   for \david@&#8203;coppit\.org; Tue\, 1 Jun 1999 13​:36​:31 -0700 Received​: from SMTP ([207.250.187.200]) by mail.binaryresearch.net   (Post.Office MTA v3.1.2 release (PO205-101c)   ID# 0-51766U100L100S0) with SMTP id AAA328;   Tue\, 1 Jun 1999 15​:36​:24 -0500 Received​: from bradmin8.binaryresearch.net ([207.250.187.61]) by 207.250.187.200   (Norton AntiVirus for Internet Email Gateways 1.0) ;   Tue\, 01 Jun 1999 20​:36​:2