Open xeno6696 opened 7 years ago
After digging, mailto URIs have their own syntax separate from typical URI standard.
from RFC-6068:
mailtoURI = "mailto:" [ to ] [ hfields ] to = addr-spec *("," addr-spec ) hfields = "?" hfield *( "&" hfield ) hfield = hfname "=" hfvalue hfname = *qchar hfvalue = *qchar addr-spec = local-part "@" domain local-part = dot-atom-text / quoted-string domain = dot-atom-text / "[" *dtext-no-obs "]" dtext-no-obs = %d33-90 / ; Printable US-ASCII %d94-126 ; characters not including ; "[", "]", or "\" qchar = unreserved / pct-encoded / some-delims some-delims = "!" / "$" / "'" / "(" / ")" / "*" / "+" / "," / ";" / ":" / "@"
@xeno6696, I think we can do this if we can convert the URI to a URL? (not sure of the feasibility)
URI.toURL()
At that point we can look at the protocol of the URL to see if it's a 'mailto' protocol. If it is, then the getPath function returns the address.
I grabbed some sample mailTo addresses from the RFC-6068 document https://tools.ietf.org/html/rfc6068
and built a simple test case to pump them through the URL class and dump out all the *get method values. It seems pretty consistent.
import java.lang.reflect.Method;
import java.net.MalformedURLException;
import java.net.URL;
import org.junit.Test;
public class MailToUriTest {
//https://tools.ietf.org/html/rfc6068
String[] basic = new String[] {"mailto:chris@example.com", "mailto:infobot@example.com?subject=current-issue", "mailto:infobot@example.com?body=send%20current-issue", "mailto:infobot@\r\n" +
"example.com?body=send%20current-issue%0D%0Asend%20index", "mailto:list@example.org?In-Reply-To=%3C3469A91.D10AF4C@\r\n" +
" example.com%3E", "mailto:majordomo@example.com?body=subscribe%20bamboo-l","mailto:joe@example.com?cc=bob@example.com&body=hello", "mailto:joe@example.com?cc=bob@example.com?body=hello", "mailto:gorby%kremvax@example.com" };
String[] complicated = new String[] {"mailto:\"not@me\"@example.org","mailto:\"oh\\\\no\"@example.org","mailto:\"\\\\\\\"it's\\ ugly\\\\\\\"\"@example.org"};
@Test
public void testURI() throws MalformedURLException, Exception {
//String mailto = "mailto:email@gmail.com?subject";
HTMLEntityCodec codec = new HTMLEntityCodec();
for (String mailto : basic) {
mailto = codec.decode(mailto);
System.out.println(mailto);
URL url = new URL(mailto);
dumpGetMethods(url);
}
for (String mailto : complicated) {
mailto = codec.decode(mailto);
System.out.println(mailto);
URL url = new URL(mailto);
dumpGetMethods(url);
}
}
private void dumpGetMethods(URL url) throws Exception {
for (Method m : URL.class.getMethods()) {
if (m.getName().startsWith("get") && m.getReturnType().equals(String.class)) {
System.out.println("\t" + m.getName() + " " + m.invoke(url));
}
}
}
}
mailto:chris@example.com
getAuthority null
getPath chris@example.com
getQuery null
getFile chris@example.com
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:infobot@example.com?subject=current-issue
getAuthority null
getPath infobot@example.com
getQuery subject=current-issue
getFile infobot@example.com?subject=current-issue
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:infobot@example.com?body=send%20current-issue
getAuthority null
getPath infobot@example.com
getQuery body=send%20current-issue
getFile infobot@example.com?body=send%20current-issue
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:infobot@
example.com?body=send%20current-issue%0D%0Asend%20index
getAuthority null
getPath infobot@
example.com
getQuery body=send%20current-issue%0D%0Asend%20index
getFile infobot@
example.com?body=send%20current-issue%0D%0Asend%20index
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:list@example.org?In-Reply-To=%3C3469A91.D10AF4C@
example.com%3E
getAuthority null
getPath list@example.org
getQuery In-Reply-To=%3C3469A91.D10AF4C@
example.com%3E
getFile list@example.org?In-Reply-To=%3C3469A91.D10AF4C@
example.com%3E
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:majordomo@example.com?body=subscribe%20bamboo-l
getAuthority null
getPath majordomo@example.com
getQuery body=subscribe%20bamboo-l
getFile majordomo@example.com?body=subscribe%20bamboo-l
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:joe@example.com?cc=bob@example.com&body=hello
getAuthority null
getPath joe@example.com
getQuery cc=bob@example.com&body=hello
getFile joe@example.com?cc=bob@example.com&body=hello
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:joe@example.com?cc=bob@example.com?body=hello
getAuthority null
getPath joe@example.com?cc=bob@example.com
getQuery body=hello
getFile joe@example.com?cc=bob@example.com?body=hello
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:gorby%kremvax@example.com
getAuthority null
getPath gorby%kremvax@example.com
getQuery null
getFile gorby%kremvax@example.com
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:"not@me"@example.org
getAuthority null
getPath "not@me"@example.org
getQuery null
getFile "not@me"@example.org
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:"oh\\no"@example.org
getAuthority null
getPath "oh\\no"@example.org
getQuery null
getFile "oh\\no"@example.org
getHost
getProtocol mailto
getRef null
getUserInfo null
mailto:"\\\"it's\ ugly\\\""@example.org
getAuthority null
getPath "\\\"it's\ ugly\\\""@example.org
getQuery null
getFile "\\\"it's\ ugly\\\""@example.org
getHost
getProtocol mailto
getRef null
getUserInfo null
So we may be able to use this to split it up and run it through "Appropriate" codecs?
What do you think?
The following unit test is incorrect in the baseline, but this one correctly shows that we don't properly canonicalize a mailto URL. For the record---the regex we currently use as a default restricts urls to the schemes "ftp" and "https*" So this is purely a future enhancement.