andyedinborough / aenetmail

C# POP/IMAP Mail Client
370 stars 153 forks source link

From is null or has character set issues #62

Open 537mfb opened 12 years ago

537mfb commented 12 years ago

Hi

First of all, thanks for sharing this.

I have found two issues with the From object as follows:

If from header ahs no display name (is in the form some@address.com, the messages From object is NULL. I get around this with the following piece of code:

name = ""; addr = ""; if (msg.Value.From != null) { name = msg.Value.From.DisplayName; addr = msg.Value.From.Address; } else { string[] tok = msg.Value.Headers["From"].RawValue.Split(new string[] { "<", ">" }, StringSplitOptions.RemoveEmptyEntries); addr = tok[0]; if (tok.Length == 2) name = tok[1]; } if (name.CompareTo(string.Empty) == 0) { string[] tokens = addr.Split('@'); name = tokens[0].Replace('.', ' ').Replace('-', ' ').Replace('_', ' '); }

this workd well so far

Another issue i have found is the character set. I didn't even know the was possible but apparently some mail boxes do allow accented characters in mail addresses - wich causes issues on your library since the address Taduções@sutherland.theukhost.net comes back as Tradu??es@sutherland.ukhost.net.

You library doesn't handle well special characters in the address.

regards Luís Rodrigues

537mfb commented 12 years ago

Just noticed that the mail address coming back with strange characters also suffers from no displayname (From is null for lack of display name and i am getting it from header) - so am not sure if that's is an issue or not

I mean - maybe you are addressing the character set issue already, just not the NULL From issue, and i am getting weird characters because i go directly to headers["From"]

piher commented 12 years ago

Hi, You should take a look at issues #61, #54 and #48

537mfb commented 12 years ago

@piher - None of those accounts for the ?? characters in the address (they only mention subject and body) and i seem to be the only one pointing out that the cause of FROM beeing NULL is that there is no display name in the header['From'] and so the string is not in the format that is expected (name, address).

It's actually in the form (address)

piher commented 12 years ago

Have you tried using https://github.com/andyedinborough/aenetmail/issues/54#issuecomment-4861497 There are still bugs but the code is supposed to handle international headers.

537mfb commented 12 years ago

I must be missing something cause if i replace my getmessages with that one in ImapClient.cs, i get a lot of error messages:

1 - in line StringBuilder body = new StringBuilder(); i get - A local variable named 'body' is already defined in this scope - that's an easy fix though - change name from body to something else

2 - there are 4 different lines using Utilities.LastIndexOfArray and Utilities.IndexOfArray and on those lines i get - AE.Net.Mail.Utilities does not contain a definition for 'MethodNameHere' - i get these on both and i downloaded AE.Net.Mail last wednesday so am pretty sure it's the last version so far

piher commented 12 years ago

You must replace the whole getMessages method in ImapClient. And the method indexof and lastindexof are just methods that I created for this specific purpose, you'll find them if you read a fex msgs up in the thread.

537mfb commented 12 years ago

i did replace the all getmessages method

will look closer at that thread for the other methods - thanks

piher commented 12 years ago

Okay, I may have left some old variables then. Anyways, the method needs some clean up to be done, there are unused variables from my previous tests...

537mfb commented 12 years ago

Re-done the GetMessages replacement and added those 2 methods to the Utilities class. My first replace of GetMessages must have had some oddities cause now the body issue is gone.

Now instead of ? inside a black lozenge, i get a plain ? in the address i mentioned

some change but not there yet

piher commented 12 years ago

Could you show me the raw "from:....." header in the email and some of the text of the headers that surround it ? We need to know if the regex matches it.

537mfb commented 12 years ago

Here is the value in raw - notice the ?? making it tradu??es instead of traduções and tradu??o instead of tradução - including in the subject and body

Delivered-To: tt.tradutores@gmail.com Received: by 10.182.236.42 with SMTP id ur10csp209607obc; Fri, 30 Mar 2012 02:55:19 -0700 (PDT) Received: by 10.180.95.74 with SMTP id di10mr8521215wib.1.1333101317951; Fri, 30 Mar 2012 02:55:17 -0700 (PDT) Return-Path: fheleno1@sutherland.theukhost.net Received: from master.multilingues.eu ([213.175.194.88]) by mx.google.com with ESMTPS id s1si1947410wiy.19.2012.03.30.02.55.17 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 30 Mar 2012 02:55:17 -0700 (PDT) Received-SPF: neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) client-ip=213.175.194.88; Authentication-Results: mx.google.com; spf=neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) smtp.mail=fheleno1@sutherland.theukhost.net Received: from 91.186.0.106 by master.multilingues.eu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from fheleno1@sutherland.theukhost.net) id 1SDYY6-0002i1-W2 for ttm@multilingues.eu; Fri, 30 Mar 2012 10:55:14 +0100 Received: from fheleno1 by sutherland.theukhost.net with local (Exim 4.69) (envelope-from fheleno1@sutherland.theukhost.net) id 1SDYY4-0001oP-Qk; Fri, 30 Mar 2012 10:55:12 +0100 To: ttm@multilingues.eu ,tt.tradutores@gmail.com, ttm@netcabo.pt Subject: Formula Tradu??o X-PHP-Script: www.tra-tec.com/EN/enviar_n.php for 66.249.72.16 From: Tradu??es@sutherland.theukhost.net MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx" Message-Id: E1SDYY4-0001oP-Qk@sutherland.theukhost.net Sender: fheleno1@sutherland.theukhost.net Date: Fri, 30 Mar 2012 10:55:12 +0100 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - sutherland.theukhost.net X-AntiAbuse: Original Domain - multilingues.eu X-AntiAbuse: Originator/Caller UID/GID - [33579 33580] / [47 12] X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - master.multilingues.eu X-AntiAbuse: Original Domain - multilingues.eu X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net

Formula : tradu??o Origem:  Destino:
Dias :  Tipo de Tradu??o: Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:
This is a multi-part message in MIME format.

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx Content-Type:text/html; charset="iso-8859-1" Content-Transfer-Encoding: 7bit

Formula : tradu??o Origem:  Destino:
Dias :  Tipo de Tradu??o: Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx Content-Type: application/octet-stream; name="" Content-Transfer-Encoding: base64

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx--

piher commented 12 years ago

Hmm... First of all, do you know what was used to send the email ? Because there is absolutely no charset specified in the headers so my code wont change anything and I don't see how any code could. Second of all, where did you copy this text you pasted ? Could you copy it directly from gmail ( there should be sthing like "show the orignial message" ) ? Thanks

537mfb commented 12 years ago

i got that from the raw value in the MailMessage that AE.Net.Mail uses From my understanding of things, that comes from a form on a website that people can fill and then get's bounced around a few e-mail addresses that keep fowarding it untill finally falling on the mailbox i need to read from (i have no control in this process) the original output in gmail is the following: (as you can see it gets the character set right)

Delivered-To: tt.tradutores@gmail.com Received: by 10.182.236.42 with SMTP id ur10csp209607obc; Fri, 30 Mar 2012 02:55:19 -0700 (PDT) Received: by 10.180.95.74 with SMTP id di10mr8521215wib.1.1333101317951; Fri, 30 Mar 2012 02:55:17 -0700 (PDT) Return-Path: fheleno1@sutherland.theukhost.net Received: from master.multilingues.eu ([213.175.194.88]) by mx.google.com with ESMTPS id s1si1947410wiy.19.2012.03.30.02.55.17 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 30 Mar 2012 02:55:17 -0700 (PDT) Received-SPF: neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) client-ip=213.175.194.88; Authentication-Results: mx.google.com; spf=neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) smtp.mail=fheleno1@sutherland.theukhost.net Received: from 91.186.0.106 by master.multilingues.eu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from fheleno1@sutherland.theukhost.net) id 1SDYY6-0002i1-W2 for ttm@multilingues.eu; Fri, 30 Mar 2012 10:55:14 +0100 Received: from fheleno1 by sutherland.theukhost.net with local (Exim 4.69) (envelope-from fheleno1@sutherland.theukhost.net) id 1SDYY4-0001oP-Qk; Fri, 30 Mar 2012 10:55:12 +0100 To: ttm@multilingues.eu ,tt.tradutores@gmail.com, ttm@netcabo.pt Subject: Formula Tradução X-PHP-Script: www.tra-tec.com/EN/enviar_n.php for 66.249.72.16 From: Traduções@sutherland.theukhost.net MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx" Message-Id: E1SDYY4-0001oP-Qk@sutherland.theukhost.net Sender: fheleno1@sutherland.theukhost.net Date: Fri, 30 Mar 2012 10:55:12 +0100 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - sutherland.theukhost.net X-AntiAbuse: Original Domain - multilingues.eu X-AntiAbuse: Originator/Caller UID/GID - [33579 33580] / [47 12] X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - master.multilingues.eu X-AntiAbuse: Original Domain - multilingues.eu X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net

Formula : tradução Origem:  Destino:
Dias :  Tipo de Tradução: Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:
This is a multi-part message in MIME format.

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx Content-Type:text/html; charset="iso-8859-1" Content-Transfer-Encoding: 7bit

Formula : tradução Origem:  Destino:
Dias :  Tipo de Tradução: Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx Content-Type: application/octet-stream; name="" Content-Transfer-Encoding: base64

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx--

537mfb commented 12 years ago

First of all, do you know what was used to send the email ?

As you can see in the header, it's a PHP script (look at X-PHP script)

Because there is absolutely no charset specified in the headers so my code wont change anything and I don't see how any code could

Actually as i said your code does change thigs - from? inside a lozenge into a plain ?.- not much of a change but still a change - And as you see from the gmail raw data, even without the character set gmail does get it right - so it IS possible for code to get it right - the question is how

Second of all, where did you copy this text you pasted ? Could you copy it directly from gmail ( there should be sthing like "show the orignial message" ) ?

The first one was the raw variable in the MailMessage object in AE.Net.Mail (get's bad character set) - the second one is the one from GMail (character set correct)

piher commented 12 years ago

Well from what I've read in the rfc I would say that this mail is not conform to the rfc because the headers contain non us-ASCII characters which are supposed to be signal by a special syntax. So I'm sorry but you'll have to wait until someone more competent comes around here because I don't see how it is possible to correctly parse this email without running some byte analysis...

537mfb commented 12 years ago

yes that was my initial assessment

i thought it weird to have non-ascii characters in e-mail address - didn't even think it was allowed (maybe am getting old) - stomped me

537mfb commented 12 years ago

according to RFV 3501, the use of & in the address and a mix of a modified utf7 and base64 are used for this cases

Is & the special signal you mentioned? i don't see it in neither the MailMessage raw nor Gmail's output though this is so weird

Don't know if that helps any,

piher commented 12 years ago

No that's not what I was talking about and as you said the email doesn't even have that. I was talking about the encoded-word syntax which looks like that : From: ?someCharset?Q?aMailAdressWithAccentuation?= You can either read rfc2047 or see the very clear explanation on : http://en.wikipedia.org/wiki/MIME#Encoded-Word

andyedinborough commented 12 years ago

Could you forward your sample message to andy.edinborough@gmail.com? Thanks!

537mfb commented 12 years ago

@andyedinborough - mail sent

@piher - thanks - will look into that too

537mfb commented 12 years ago

Just found another FROM null, this time it does have a display name though

headers['From'].rawvalue contains

\"PT, IBM-AP\" <IBM-AP.PT@unilever.com>

Something to do with the \" encapsulating the name maybe? They are required according to the RFC since the name contains a comma - as far as i can tell this name/address pair conforms to tthe RFC

to get around this issue i use the following code (fix from one i left on my original posting way above)

name = "";
addr = "";
if (msgs[i].Value.From != null) // Get Name and Address from FROM object
{
    name = msgs[i].Value.From.DisplayName;
    addr = msgs[i].Value.From.Address;
}
else // Parse Name and Address from Header's RawValue
{
    string[] tok = msgs[i].Value.Headers["From"].RawValue.Split(new string[] { "<", ">" }, StringSplitOptions.RemoveEmptyEntries);
    if (tok.Length == 1) // Only Address is found
        addr = tok[0];
    else // Name and Address are found
    {
        addr = tok[1];
        name = tok[0];
    }
}
if (name.CompareTo(string.Empty) == 0) // If Name wasn't found, parse one from Address
{
    string[] tokens = addr.Split('@');
    name = tokens[0].Replace('.', ' ').Replace('-', ' ').Replace('_', ' ');
}

this happens with all names with " in them - not just this one

537mfb commented 12 years ago

804a9f6 fixes the character set in address issue

shawncarr commented 12 years ago

Code was not committed to master only master-net35 so still able to reproduce this with latest.

jstedfast commented 10 years ago

The only way you'll ever get your address parser to work reliably is if you switch to using a tokenizer. String.Split() and IndexOf() approaches will only become completely unmaintainable and it's unlikely you'll ever reach a point where it works reliably for everyone.

I highly recommend taking a look at email address parser in MimeKit. Take a look at InternetAddressList.cs and InternetAddress.cs - they handle everything you can throw at them, including comments in the middle of the address. It also handles old-style addresses like this:

From: nsb@host (Neil Bornstein)