JohnBartlett / google-email-uploader-mac

Automatically exported from code.google.com/p/google-email-uploader-mac
Other
0 stars 0 forks source link

-- Invalid RFC 822 Message: Date header "Fri May 26 10:01:06 PDT 2000" is invalid. -- #5

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
  1. uploading an OLD (1992!) Eudora mbox !

What is the expected output? What do you see instead?
  expected smooth import ..
  saw several messages: "The operation couldn’t be completed. (Invalid RFC 822 Message: Date 
header "Fri May 26 10:01:06 PDT 2000" is invalid.)"  Many other 
messages in the file 
were imported successfully.

What version of the product are you using? On what operating system?
  1.0 / Mac OS X 10.6.2

Please provide any additional information below.
   I have kept the log and the import file if you need them but I'm not going to put them here!!

Original issue reported on code.google.com by gavineadie on 30 Jan 2010 at 2:30

GoogleCodeExporter commented 8 years ago
I did some more digging and the mbox I'm importing is more complicated than I 
thought.  Also, looking at the 
source code, I see the error presented is coming back from the server.  I'll 
look at the exact headers sent by 
gmum to see how they are invalid (I suspect gmum is having to construct a 
"Date:" with insufficient or 
confusing information).

FWIW: Here's a truncated version of a failing message.  Note the original 
headers are moved to a 'quotation' in 
the message body .. I believe this was done by Apple's AOCE/PowerTalk mailer.  
It may turn out that I have to 
pre-process these mbox's before feeding them to gmum ... more later.

From PowerTalk 01 Jan 97 00:00:00 GMT
Date:    Wed, 20 Mar 1996, 14:3:58
Subject: ACAP protocol outline
From:    jgm+@CMU.EDU (John Gardiner Myers)
To:      ietf-acap+@andrew.cmu.edu (IETF ACAP Mailing List)

ABNF extension:

    a ::+ b
    a ::+ c
    a ::+ d

    is equivalent to

    a ::= (b) / (c) / (d)

Protocol overview

  atom, quoted-string, literal

  no-wait literal -- client doesn't need to wait for ready response
    "{" number "+}" CRLF *OCTET
    ; NUL only allowed in value of attributes whose names end in
    ; ".bin"

Initial greeting

    initial_greeting ::= "*" SP "OK" *(SP capability) CRLF

    capability ::= atom [ "=" astring ]

------------------ RFC822 Header Follows ------------------
 Received: from judgmentday.rs.itd.umich.edu by dagora.rs.itd.umich.edu (8.7.4/2.2)
    id OAA02513; Wed, 20 Mar 1996 14:14:17 -0500 (EST)
 Received: by judgmentday.rs.itd.umich.edu (8.7.4/2.2)
    with X.500 id OAA27307; Wed, 20 Mar 1996 14:14:14 -0500 (EST)
 Received: from po10.andrew.cmu.edu by judgmentday.rs.itd.umich.edu (8.7.4/2.2)
    with ESMTP id OAA27302; Wed, 20 Mar 1996 14:14:13 -0500 (EST)
 Received: (from postman@localhost) by po10.andrew.cmu.edu (8.7.5/8.7.1) id OAA00723; Wed, 20 Mar 1996 
14:04:19 -0500
 Received: via switchmail; Wed, 20 Mar 1996 14:04:18 -0500 (EST)
 Received: from hogtown.andrew.cmu.edu via qmail
           ID </afs/andrew.cmu.edu/service/mailqs/testq0/QF.klI5OYq00WBwB:xE53>;
           Wed, 20 Mar 1996 14:04:05 -0500 (EST)
 Received: from hogtown.andrew.cmu.edu via qmail
           ID </afs/andrew.cmu.edu/usr7/jgm/.Outgoing/QF.clI5OSG00WBwA12PtP>;
           Wed, 20 Mar 1996 14:03:58 -0500 (EST)
 Received: from BatMail.robin.v2.14.CUILIB.3.45.SNAP.NOT.LINKED.hogtown.andrew.cmu.edu.sun4c.411
           via MS.5.6.hogtown.andrew.cmu.edu.sun4c_411;
           Wed, 20 Mar 1996 14:03:58 -0500 (EST)
 Message-ID: <klI5OSC00WBwI12Pl0@andrew.cmu.edu>
 Date: Wed, 20 Mar 1996 14:03:58 -0500 (EST)
 X-UIDL: 827350024.003
 From: John Gardiner Myers <jgm+@CMU.EDU>
 To: IETF ACAP Mailing List <ietf-acap+@andrew.cmu.edu>
 Subject: ACAP protocol outline
 Status: U

From PowerTalk 01 Jan 97 00:00:00 GMT
Date:    Wed, 20 Mar 1996, 13:59:41

Original comment by gavineadie on 30 Jan 2010 at 4:11

GoogleCodeExporter commented 8 years ago
Here's a patch to EmUpUtilities.m to fix my problem:

199a200,229
>     // 
------------------------------------------------------------------------------
>     // Apple's old AOCE/PowerTalk mailer used to move the 'extra' headers to 
the end
>     // of the body of the message. If this is the case, we need to find them 
and add
>     // them into the header collection (overwriting any duplicates, because 
this IS 
>     // the original header set).
>     
>     NSString *    body = [message substringFromIndex:[headers length]];
>     NSScanner *   bodyScanner = [NSScanner scannerWithString:body];
>     [bodyScanner setCharactersToBeSkipped:[NSCharacterSet 
whitespaceCharacterSet]];
>     NSString *    rfc822Divider = [NSString stringWithFormat:@"%@%@%@",
>                 endOfLine, endOfLine, @"------------------ RFC822 Header 
Follows ------------------"];
>     
>     if ([bodyScanner scanUpToString:rfc822Divider intoString:nil]) {
>         [bodyScanner scanUpToString:@"" intoString:&headers];
>         
>         // advance past the divider line and then outdent <moreHeader> one 
space
>         headers = [headers substringFromIndex:[rfc822Divider length]];
>         if ([headers length] > 0) {
>             NSMutableString *   mutable = [NSMutableString 
stringWithString:headers];
>             [mutable replaceOccurrencesOfString:@"\n "
>                                      withString:@"\n"
>                                         options:0
>                                           range:NSMakeRange(0, [mutable 
length])];
> 
>             message = [[mutable substringFromIndex:1] 
stringByAppendingString:body]; 
>         }
>     }
> 
>     // 
------------------------------------------------------------------------------
>         
219a250,251
>       
>       

Original comment by gavineadie on 30 Jan 2010 at 8:48

GoogleCodeExporter commented 8 years ago
Actually, I think I missed an encoding thing.  The above works with a section 
of the original file copied/pasted 
into a test file, but not with the original.  Thinking cap is back on!

Original comment by ramsayc...@gmail.com on 31 Jan 2010 at 2:43

GoogleCodeExporter commented 8 years ago
Found it.  Misunderstanding of how the scanner works.  Here's the replacement:

199a200,231
>     // 
------------------------------------------------------------------------------
>     // Apple's old AOCE/PowerTalk mailer used to move the 'extra' headers to 
the end
>     // of the body of the message. If this is the case, we need to find them 
and add
>     // them into the header collection (overwriting any duplicates, because 
this IS 
>     // the original header set).
>     
>     NSString *    body = [message substringFromIndex:[headers length]];
>     NSScanner *   bodyScanner = [NSScanner scannerWithString:body];
>     [bodyScanner setCharactersToBeSkipped:[NSCharacterSet 
whitespaceCharacterSet]];
>     NSString *    rfc822Divider = [NSString stringWithFormat:@"%@%@%@",
>                 endOfLine, endOfLine, @"------------------ RFC822 Header 
Follows ------------------"];
>     
>     if ([body rangeOfString:rfc822Divider].length > 0) {
>         [bodyScanner scanUpToString:rfc822Divider intoString:nil];
>         [bodyScanner scanUpToString:@"" intoString:&headers];
>         
>         // advance past the divider line and then outdent <moreHeader> one 
space
>         headers = [headers substringFromIndex:[rfc822Divider length]];
>         if ([headers length] > 0) {
>             NSMutableString *   mutable = [NSMutableString 
stringWithString:headers];
>             [mutable replaceOccurrencesOfString:[endOfLine 
stringByAppendingString:@" "]
>                                      withString:endOfLine
>                                         options:0
>                                           range:NSMakeRange(0, [mutable 
length])];
> 
>             message = [[mutable substringFromIndex:1] 
stringByAppendingString:body]; 
>         }
>     }
> 
>     NSLog(@"MESSAGE:\n%@", message);
>     // 
------------------------------------------------------------------------------
>         

Original comment by ramsayc...@gmail.com on 31 Jan 2010 at 3:55

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I'm seeing the same thing with old email that I've imported from Pine to Eudora 
to Mail.app.  I am importing from 
Mail.app's .emlx files though.

Original comment by dfbi...@gmail.com on 3 Feb 2010 at 1:53

Attachments:

GoogleCodeExporter commented 8 years ago
The more I've looked at importing email to Google Apps the more it's apparent 
that the Google RFC822 importer 
is overly strict in its acceptance of malformed headers.  In this game the 
maxim is, "be generous accepting 
mistakes on import, and don't make any in what you export."

For example the "Date: " header above:

      Date: Wed, 20 Mar 1996 14:03:58 -0500 (EST)

            is illegal because of the "(EST)."  You either use "-0500" OR "EST" but not both and that's why Google is 
rejecting that message.  Another example:

      Date: Wed, 20 Mar 1996, 14:03:58 -0500

            that second comma is illegal and causes Google to reject messages that have that error.

This isn't something that the email-uploader can influence unless it's going to 
parse all the headers and correct 
the erroneous ones, which is not something for the faint-hearted -- I've done 
that once in my career, and once 
was enough!  Just about every email program under the sun makes some kind of 
error in its generated headers 
... ugh

Original comment by gavineadie on 3 Feb 2010 at 3:15

GoogleCodeExporter commented 8 years ago
So does this mean we've reached a dead-end?

Original comment by dfbi...@gmail.com on 3 Feb 2010 at 2:04

GoogleCodeExporter commented 8 years ago
I went in and hand-edited all the offending headers till the file was imported 
successfully -- BBEdit and a few 
canned scripts made it fairly easy.  FWIW, I've imported 106,000 messages this 
week!

Original comment by gavineadie on 3 Feb 2010 at 2:54

GoogleCodeExporter commented 8 years ago
I've filed a bug internally on the Email Migration API (2417545) with your 
comments
about the parser strictness.  You can also bring it up on the Email Migration 
API
discussion group at http://tinyurl.com/yatasrx

Did you find the "RFC822 Header Follows" to be an unqualified win? I'd worry 
about
message replies having extra copies of those headers, so they'd end up 
overwriting
other headers inappropriately. While I find some examples of the "RFC822 Header
Follows" in my old mail archives, too, they do not appear to be indented a 
space.

Original comment by gregrobbins on 3 Feb 2010 at 10:52

GoogleCodeExporter commented 8 years ago
Thanks for taking my comments over there, Greg.

I worried about having extra copies of the headers too and, in fact, the code 
snippet above doesn't do what 
the comment says (oops), it actually doesn't use the 'top headers' at all, it 
ignores them and only uses the 'tail 
headers.'  And, yes, I also had many messages which did not have the indented 
headers.  In some cases the 
'top headers' were totally useless, so losing them was a benefit.

I've been importing batches of messages dated 1994-2000 during which time I 
used multiple versions of 
multiple mail clients and my messages had passed through a few different 
gateways .. the variations on the 
'standard' are frightening!

I wrote my first SMTP gateway in 1982 (in a language called Plus, a derivative 
of Sue) and had to incorporate 
RFC822 parsing and generation, in addition to the expected RFC821 work, because 
the mainframe host 
system (MTS) maintained messages in a database and didn't utilize RFC822 
formats.  I've been reliving those 
days this last week and I'm of mixed emotions -- ~30 year reminiscences are 
rosy till you remember the 
details!!

Original comment by gavineadie on 3 Feb 2010 at 11:43

GoogleCodeExporter commented 8 years ago
The latest glitch is finding both \r AND \n used in converted AOCE messages.  
The \n is in the headers and so is 
assigned to <endOfLine>; the \r is the line separator in the body and so 
defeats the check for "RFC822 Header 
Follows" .. temporary solution is to look at the character after that string 
and use it in the unindenting 
<replaceOccurrencesOfString> instead of <endOfLine>.

Permanent solution is to talk to author of Emailchemy (the converter I use) and 
ask him to be sure all the end-
of-line strings are the same throughout the file.

Original comment by gavineadie on 4 Feb 2010 at 3:11

GoogleCodeExporter commented 8 years ago
This is a big issue for me as well - Mac OS X 10.6.2 and Apple Mail 4.2 (1077). 
 Can't upload email for 40 users 
that I want to move to Google Apps .. trying to do this through IMAP instead, 
but sloooooooow :-)

If there's any info I can provide to help resolve this, I'm more than happy to 
help.

Original comment by nathan.m...@gmail.com on 31 Mar 2010 at 4:29

GoogleCodeExporter commented 8 years ago
I am also having issues with this.  I am not sure how to handle this since I 
have a user that has about 10,000 
emails we are trying to upload.  Is there something I can do to get this to 
work or a work around?  I would 
appreciate any suggestions since I do not know how to go forward.

Original comment by rwburke%...@gtempaccount.com on 19 Apr 2010 at 3:55

GoogleCodeExporter commented 8 years ago
The server team says there will be some improvements in date header parsing 
rolling out, hopefully this week.

Original comment by gregrobbins on 19 Apr 2010 at 7:44

GoogleCodeExporter commented 8 years ago
Any news?   I continue to migrate thousands of emails via IMAP drag 'n' drop 
because the uploader is unhappy 
with extra quotes in the date header.  Let me know if I can provide any help

Original comment by nathan.m...@gmail.com on 1 Jun 2010 at 4:20

GoogleCodeExporter commented 8 years ago
The server team has made some changes already. If you still see uploading 
errors due to headers that look like 
they should be legal (or close enough), please paste example(s) here.

Original comment by gregrobbins on 1 Jun 2010 at 11:18

GoogleCodeExporter commented 8 years ago
   The operation couldn’t be completed. (Date header "Thu Jan 05 16:15:01 2006" is invalid.)

Original comment by b.n.will...@gmail.com on 2 Oct 2013 at 4:43