Closed limonte closed 2 years ago
well noticed
To clarify the expected behavior:
.com
, .cc
, etc.) should be ignored at all@gmail
, @hotmail
, etc.) should be ignored@my-company
) should be included, e.g. when user type "my-company" then john@my-company.com
should be shown.We have a method somewhere with the list of these common shared email domain names
We have a method somewhere with the list of these common shared email domain names
I wasn't able to find that method. This array looks good enough (taken from https://github.com/derhuerst/email-providers)
["yahoo.com","qq.com","virginmedia.com","msn.com","live.com","go.com","aol.com","163.com","free.fr","about.com","alibaba.com","geocities.com","outlook.com","indiatimes.com","yandex.ru","netscape.com","yahoo.co.jp","angelfire.com","earthlink.net","sky.com","mail.ru","discovery.com","gmail.com","frontier.com","naver.com","uol.com.br","homestead.com","icloud.com","medscape.com","mac.com","zoho.com","space.com","lycos.com","icq.com","comcast.net","altavista.com","orange.fr","t-online.de","hotmail.com","me.com","sapo.pt","rambler.ru","att.net","cox.net","canada.com","yandex.com","ancestry.com","sina.cn","kansascity.com","sina.com","libero.it","law.com","care2.com","wanadoo.fr","tom.com","fortunecity.com","berlin.de","onet.pl","21cn.com","excite.com","terra.com.br","geek.com","india.com","techspot.com","compuserve.com","shaw.ca","blackplanet.com","mindspring.com","web.de","freeserve.co.uk","ntlworld.com","rr.com","ig.com.br","wp.pl","sympatico.ca","excite.co.jp","chez.com","interia.pl","kiwibox.com","virgilio.it","tiscali.it","arcor.de","iinet.net.au","blueyonder.co.uk","xoom.com","126.com","rcn.com","sfr.fr","freenet.de","gazeta.pl","online.de","yam.com","verizon.net","gmx.net","btinternet.com","hotbot.com","lycos.co.uk","ozemail.com.au","aol.co.uk","detik.com","virgin.net","ireland.com","terra.es","catholic.org","hamptonroads.com","doityourself.com","parrot.com","charter.net","www.com","lycos.de","ivillage.com","myway.com","albawaba.com","rogers.com","name.com","ya.ru","voila.fr","oath.com","freeyellow.com","mail.com","pochta.ru","centrum.cz","sify.com","tiscali.co.uk","chat.ru","telus.net","mydomain.com","o2.co.uk","alice.it","seznam.cz","freeuk.com","tpg.com.au","iespana.es","optusnet.com.au","rin.ru","lycos.es","sci.fi","metacrawler.com","walla.co.il","yahoo.co.uk","lycos.nl","sweb.cz","looksmart.com","bigpond.com","yeah.net","terra.com","prodigy.net","eircom.net","yahoo.com.cn","frontiernet.net","mail2web.com","wanadoo.es","aim.com","37.com","i.am","dailypioneer.com","sanook.com","foxmail.com","unican.es","docomo.ne.jp","hot.ee","yahoofs.com","newmail.ru","gmx.com","ukr.net","cogeco.ca","crosswinds.net","bugmenot.com","webindia123.com","pacbell.net","iol.it","10minutemail.com","netins.net","depechemode.com","incredimail.com","zip.net","dejanews.com","lycos.it","elvis.com","bangkok.com","iprimus.com.au","juno.com","onmilwaukee.com","o2.pl","supereva.it","netspace.net.au","islamonline.net","starmedia.com","mailinator.com","go.ro","swissinfo.org","saudia.com","spray.se","idirect.com","bigfoot.com","land.ru","casino.com","gmx.de","bellsouth.net","yahoo.fr","yahoo.jp","bigpond.net.au","interfree.it","adelphia.net","thirdage.com","zonnet.nl","msn.co.uk","yahoo.de","btconnect.com","hispavista.com","ny.com","cableone.net","tds.net","gportal.hu","roadrunner.com","inbox.com","hushmail.com","4mg.com","seanet.com","masrawy.com","aeiou.pt","terra.cl","yahoo.com.br","singnet.com.sg","terra.com.ar","front.ru","c3.hu","123.com","sbcglobal.net","netzero.net","yahoo.com.tw","montevideo.com.uy","scubadiving.com","everyone.net","fastmail.fm","maktoob.com","iwon.com","mchsi.com","webjump.com","bright.net","telstra.com","westnet.com.au","c2i.net","talktalk.net","talkcity.com","eyou.com","usa.com","cs.com","btopenworld.com","gocollege.com","hotmail.ru","spacewar.com","sp.nl","excite.it","bolt.com","singpost.com","zzn.com","centrum.sk","go2net.com","yahoo.es","bizhosting.com","yahoo.ca","3ammagazine.com","vnn.vn","optimum.net","fuse.net","windowslive.com","home.ro","windstream.net","korea.com","ptd.net","ados.fr","swbell.net","inbox.lv","freeuk.net","dog.com","barcelona.com","tiscali.be","dnsmadeeasy.com","abv.bg","netscape.net","freeola.com","peoplepc.com","beer.com","onlinehome.de","jokes.com","garbage.com","list.ru","epix.net","nyc.com","pisem.net","centurytel.net","live.cn","gratisweb.com","ymail.com","lavabit.com","bol.com.br","handbag.com","spymac.com","yopmail.com","laposte.net","apollo.lv","fromru.com","gmx.at","rediffmail.com","theglobe.com","qwestoffice.net","centurylink.net","wowway.com","t-online.hu","euroseek.com","ananzi.co.za"]
why not just use that library?
Best would be to copy the array and add proper attribution in a comment above it, pointing to their license.
The reason is that adding another library that auto-updates slightly increases risk of some bad code slipping in through a dependency.
it looks like it uses binary search among sorted emails, lower bound is formed (in this case) as t:zzz...zz@zz...zz.cc
or f:zzz...zzz@zz...zz.cc
and upper bound - same, but with successive character (i.e. d
here), assuming they must be stored in db as t:...
or f:...
but then I am not sure how it ever finds marco...@...cc
: if we take following 3 strings in the ascending sort order, it will be:
f:marco.marco@ccmarco.cc
f:zzzzzzzzzzz@zzzzzzz.cc
f:zzzzzzzzzzz@zzzzzzz.cd
@rrrooommmaaa Can you please clarify how it really works.
In composer, we're looking to find (OpenPGP or S/MIME) keys in the end, so if the user entered "marco" we're searching the range 't:marco'...'t:marcp'
Current indexing procedure (updateSearchable
) populates searchable
array (in case marco.marco@ccmarco.cc
has at least one key)
to be ['t:marco', 't:ccmarco', 't:cc']
,
but when this new PR #4215 is merged, it will be populated as ['t:marco.marco@ccmarco.cc', 't:marco@ccmarco.cc', 't:ccmarco.cc', 't:cc']
so it would be easier to throw away domains search-time.
Example: if searching for 't:live.com' yields 't:live.com@something.com' -- that's ok, if it yields 't:live.com' (and 'live.com' is in blacklist) -- the record is ignored.
We can of course consider throwing away some stuff at indexing-time, but it's harder to revert -- we'll need to add a special migration procedure to re-index all existing records, similar to what is done in PR #4215
So need to
1) wait until PR #4215 is merged
2) discuss pros and cons of throwing something out (not saving to searchable
) during indexing
3) implement necessary modifications for a) indexing and/or b) search procedures
Thank you @rrrooommmaaa . So then I'm waiting for that another PR to be merged.
@rrrooommmaaa So item 1 from above seems to be done, let's get started with item 2.
Yes, I gave it a bit of thought yesterday.
As searchable
is populated with both email and name data [...Str.splitAlphanumericExtended(email), ...Str.splitAlphanumericExtended(name)]
, it would be user-friendly to allow searching by the word 'com', if 'com' is present in the name part. As at search time the source of the token isn't possible to determine, we need to filter out tokens at indexing time (in updateSearchable
procedure). Thus, a migration procedure (like extendSearchables
) is needed.
We can bear in mind that we may later add an additional post-search filter for second-level domains etc., however the edge case to be discussed is when a contact has something like this "Someone gmail.com" in their name. I guess such contact should be searchable by "gmail.com" and post-search filter isn't applicable.
As for migration, to have as little code as possible, we can re-use extendSearchables
procedure (renaming it appropriately) and adding a new trigger flag (instead of contact_store_searchable_extended
) -- contact_store_searchable_extended
will no longer be needed if all email entries are update -- a single updateSearchable() call will make sure that tokens are extended and filtered
I think the matching top domain zone should not suggest contacts: