FlowCrypt / flowcrypt-browser

FlowCrypt Browser extension for Chrome and Firefox
https://flowcrypt.com
Other
374 stars 46 forks source link

Search contacts matching should be stricter, do not suggest `name.surname@domain.cc` for `zzzz@zzzzz.cc` #3679

Closed limonte closed 2 years ago

limonte commented 3 years ago

I think the matching top domain zone should not suggest contacts:

image

tomholub commented 3 years ago

well noticed

limonte commented 3 years ago

To clarify the expected behavior:

tomholub commented 3 years ago

We have a method somewhere with the list of these common shared email domain names

limonte commented 3 years ago

We have a method somewhere with the list of these common shared email domain names

I wasn't able to find that method. This array looks good enough (taken from https://github.com/derhuerst/email-providers)

["yahoo.com","qq.com","virginmedia.com","msn.com","live.com","go.com","aol.com","163.com","free.fr","about.com","alibaba.com","geocities.com","outlook.com","indiatimes.com","yandex.ru","netscape.com","yahoo.co.jp","angelfire.com","earthlink.net","sky.com","mail.ru","discovery.com","gmail.com","frontier.com","naver.com","uol.com.br","homestead.com","icloud.com","medscape.com","mac.com","zoho.com","space.com","lycos.com","icq.com","comcast.net","altavista.com","orange.fr","t-online.de","hotmail.com","me.com","sapo.pt","rambler.ru","att.net","cox.net","canada.com","yandex.com","ancestry.com","sina.cn","kansascity.com","sina.com","libero.it","law.com","care2.com","wanadoo.fr","tom.com","fortunecity.com","berlin.de","onet.pl","21cn.com","excite.com","terra.com.br","geek.com","india.com","techspot.com","compuserve.com","shaw.ca","blackplanet.com","mindspring.com","web.de","freeserve.co.uk","ntlworld.com","rr.com","ig.com.br","wp.pl","sympatico.ca","excite.co.jp","chez.com","interia.pl","kiwibox.com","virgilio.it","tiscali.it","arcor.de","iinet.net.au","blueyonder.co.uk","xoom.com","126.com","rcn.com","sfr.fr","freenet.de","gazeta.pl","online.de","yam.com","verizon.net","gmx.net","btinternet.com","hotbot.com","lycos.co.uk","ozemail.com.au","aol.co.uk","detik.com","virgin.net","ireland.com","terra.es","catholic.org","hamptonroads.com","doityourself.com","parrot.com","charter.net","www.com","lycos.de","ivillage.com","myway.com","albawaba.com","rogers.com","name.com","ya.ru","voila.fr","oath.com","freeyellow.com","mail.com","pochta.ru","centrum.cz","sify.com","tiscali.co.uk","chat.ru","telus.net","mydomain.com","o2.co.uk","alice.it","seznam.cz","freeuk.com","tpg.com.au","iespana.es","optusnet.com.au","rin.ru","lycos.es","sci.fi","metacrawler.com","walla.co.il","yahoo.co.uk","lycos.nl","sweb.cz","looksmart.com","bigpond.com","yeah.net","terra.com","prodigy.net","eircom.net","yahoo.com.cn","frontiernet.net","mail2web.com","wanadoo.es","aim.com","37.com","i.am","dailypioneer.com","sanook.com","foxmail.com","unican.es","docomo.ne.jp","hot.ee","yahoofs.com","newmail.ru","gmx.com","ukr.net","cogeco.ca","crosswinds.net","bugmenot.com","webindia123.com","pacbell.net","iol.it","10minutemail.com","netins.net","depechemode.com","incredimail.com","zip.net","dejanews.com","lycos.it","elvis.com","bangkok.com","iprimus.com.au","juno.com","onmilwaukee.com","o2.pl","supereva.it","netspace.net.au","islamonline.net","starmedia.com","mailinator.com","go.ro","swissinfo.org","saudia.com","spray.se","idirect.com","bigfoot.com","land.ru","casino.com","gmx.de","bellsouth.net","yahoo.fr","yahoo.jp","bigpond.net.au","interfree.it","adelphia.net","thirdage.com","zonnet.nl","msn.co.uk","yahoo.de","btconnect.com","hispavista.com","ny.com","cableone.net","tds.net","gportal.hu","roadrunner.com","inbox.com","hushmail.com","4mg.com","seanet.com","masrawy.com","aeiou.pt","terra.cl","yahoo.com.br","singnet.com.sg","terra.com.ar","front.ru","c3.hu","123.com","sbcglobal.net","netzero.net","yahoo.com.tw","montevideo.com.uy","scubadiving.com","everyone.net","fastmail.fm","maktoob.com","iwon.com","mchsi.com","webjump.com","bright.net","telstra.com","westnet.com.au","c2i.net","talktalk.net","talkcity.com","eyou.com","usa.com","cs.com","btopenworld.com","gocollege.com","hotmail.ru","spacewar.com","sp.nl","excite.it","bolt.com","singpost.com","zzn.com","centrum.sk","go2net.com","yahoo.es","bizhosting.com","yahoo.ca","3ammagazine.com","vnn.vn","optimum.net","fuse.net","windowslive.com","home.ro","windstream.net","korea.com","ptd.net","ados.fr","swbell.net","inbox.lv","freeuk.net","dog.com","barcelona.com","tiscali.be","dnsmadeeasy.com","abv.bg","netscape.net","freeola.com","peoplepc.com","beer.com","onlinehome.de","jokes.com","garbage.com","list.ru","epix.net","nyc.com","pisem.net","centurytel.net","live.cn","gratisweb.com","ymail.com","lavabit.com","bol.com.br","handbag.com","spymac.com","yopmail.com","laposte.net","apollo.lv","fromru.com","gmx.at","rediffmail.com","theglobe.com","qwestoffice.net","centurylink.net","wowway.com","t-online.hu","euroseek.com","ananzi.co.za"]
IvanPizhenko commented 2 years ago

why not just use that library?

tomholub commented 2 years ago

Best would be to copy the array and add proper attribution in a comment above it, pointing to their license.

The reason is that adding another library that auto-updates slightly increases risk of some bad code slipping in through a dependency.

IvanPizhenko commented 2 years ago

it looks like it uses binary search among sorted emails, lower bound is formed (in this case) as t:zzz...zz@zz...zz.cc or f:zzz...zzz@zz...zz.cc and upper bound - same, but with successive character (i.e. d here), assuming they must be stored in db as t:... or f:... but then I am not sure how it ever finds marco...@...cc: if we take following 3 strings in the ascending sort order, it will be:

f:marco.marco@ccmarco.cc
f:zzzzzzzzzzz@zzzzzzz.cc
f:zzzzzzzzzzz@zzzzzzz.cd

@rrrooommmaaa Can you please clarify how it really works.

rrrooommmaaa commented 2 years ago

In composer, we're looking to find (OpenPGP or S/MIME) keys in the end, so if the user entered "marco" we're searching the range 't:marco'...'t:marcp' Current indexing procedure (updateSearchable) populates searchable array (in case marco.marco@ccmarco.cc has at least one key) to be ['t:marco', 't:ccmarco', 't:cc'], but when this new PR #4215 is merged, it will be populated as ['t:marco.marco@ccmarco.cc', 't:marco@ccmarco.cc', 't:ccmarco.cc', 't:cc'] so it would be easier to throw away domains search-time. Example: if searching for 't:live.com' yields 't:live.com@something.com' -- that's ok, if it yields 't:live.com' (and 'live.com' is in blacklist) -- the record is ignored. We can of course consider throwing away some stuff at indexing-time, but it's harder to revert -- we'll need to add a special migration procedure to re-index all existing records, similar to what is done in PR #4215 So need to 1) wait until PR #4215 is merged 2) discuss pros and cons of throwing something out (not saving to searchable) during indexing 3) implement necessary modifications for a) indexing and/or b) search procedures

IvanPizhenko commented 2 years ago

Thank you @rrrooommmaaa . So then I'm waiting for that another PR to be merged.

IvanPizhenko commented 2 years ago

@rrrooommmaaa So item 1 from above seems to be done, let's get started with item 2.

rrrooommmaaa commented 2 years ago

Yes, I gave it a bit of thought yesterday. As searchable is populated with both email and name data [...Str.splitAlphanumericExtended(email), ...Str.splitAlphanumericExtended(name)], it would be user-friendly to allow searching by the word 'com', if 'com' is present in the name part. As at search time the source of the token isn't possible to determine, we need to filter out tokens at indexing time (in updateSearchable procedure). Thus, a migration procedure (like extendSearchables) is needed. We can bear in mind that we may later add an additional post-search filter for second-level domains etc., however the edge case to be discussed is when a contact has something like this "Someone gmail.com" in their name. I guess such contact should be searchable by "gmail.com" and post-search filter isn't applicable. As for migration, to have as little code as possible, we can re-use extendSearchables procedure (renaming it appropriately) and adding a new trigger flag (instead of contact_store_searchable_extended) -- contact_store_searchable_extended will no longer be needed if all email entries are update -- a single updateSearchable() call will make sure that tokens are extended and filtered