ProtonMail / WebClients

Monorepo hosting the proton web clients
GNU General Public License v3.0
4.27k stars 545 forks source link

Why is ProtonMail support so atrocious? Fulltext search does not work properly #277

Open exander77 opened 2 years ago

exander77 commented 2 years ago

Instead of passing my reproducible cases to development, I have to go back and forth 20 rounds and ProtonMail isn't able to even verify that the issue exist.

Here you have it, receive this small example:

From: example@example.com
Date: Fri, 28 Jan 2022 23:55:19 +0100
Subject: Sample
To: example@example.com
Content-Type: multipart/mixed;boundary=---------------------1869830f9274e5fbca1ed10844258a26

-----------------------1869830f9274e5fbca1ed10844258a26
Content-Type: multipart/related;boundary=---------------------56cd6ec8c705394f1d1182bcc737b0aa

-----------------------56cd6ec8c705394f1d1182bcc737b0aa
Content-Type: text/html;charset=utf-8
Content-Transfer-Encoding: base64

PGRpdiBkaXI9ImF1dG8iPkRvYnLDvSB2ZcSNZXIsIDwvZGl2PjxkaXYgZGlyPSJhdXRvIj5Ba3R1w6FsbsSbIG3DoW0gcG91emU
gdHV0byBqZWRudS4gPC9kaXY+PGRpdiBkaXI9ImF1dG8iPjxicj48L2Rpdj48ZGl2IGRpcj0iYXV0byI+UyBwb3pkcmF2ZW0gPC
9kaXY+PGRpdj48YnI+PGRpdiBjbGFzcz0iZ21haWxfcXVvdGUiPjxkaXYgZGlyPSJsdHIiIGNsYXNzPSJnbWFpbF9hdHRyIj5ww
6EgMjgu4oCvMS7igK8yMDIyIHYgMjM6NDQgb2Rlc8OtbGF0ZWwgJmx0OyZndDsgbmFwc2FsOjxicj48L2Rpdj48YmxvY2txdW90
ZSBjbGFzcz0iZ21haWxfcXVvdGUiIHN0eWxlPSJtYXJnaW46MHB4IDBweCAwcHggMC44ZXg7Ym9yZGVyLWxlZnQtd2lkdGg6MXB
4O2JvcmRlci1sZWZ0LXN0eWxlOnNvbGlkO3BhZGRpbmctbGVmdDoxZXg7Ym9yZGVyLWxlZnQtY29sb3I6cmdiKDIwNCwyMDQsMj
A0KSI+SmFrw6kgbcOhdGUgamXFoXTEmyBHUFU/PGJyPgo8YnI+ClByb2TDoW0gZ3JhZmlja8OpIGthcnR5LiA8YnI+ClbFoWVja
G55IGZ1bmvEjW7DrSwgc3RhdiB2aXogZm90by4gPGJyPgoxLiBBVEkgUmFkZW9uIFgzMDBTRSAyNTYgTUIgSHlwZXIgTWVtb3J5
IFBDSS1FIFRWTy9EVkkgPGJyPgpDZW5hIDUwIEvEjS9rdXMsIG1vxb5ubyBwcm9kYXQgamVkbm90bGl2xJssIG5lYm8gamFrbyB
jZWxlay4gPGJyPgo8YnI+Ck1vxb5ub3N0IG9zb2Juw61obyBvZGLEm3J1IHYgT3N0cmF2xJsgUG9ydWLEmywgcG8gZG9ob2TEmy
Btb8W+bm8gemFzbGF0IHBvxaF0b3UgbmVibyBwxZllcyB6w6FzaWxrb3ZudSAocGxhdGJhIHDFmWVkZW0gbmEgw7rEjWV0ICsgc
G/FoXRvdm7DqSkuIDxicj4KPGJyPgpOZXbDoWhlanRlIGEgcG9kw612ZWp0ZSBzZSBpIG5hIG3DqSBkYWzFocOtIGluemVyw6F0
eSwgbW9obG8gYnkgVsOhcyB6YXVqbW91dCBpIG7Em2NvIGRhbMWhw61oby48YnI+Cjxicj4KWmJvxb7DrSBqZSBiZXogesOhcnV
reS4gPGJyPgpLdXB1asOtY8OtIG3DoSBtb8W+bm9zdCBzaSB6Ym/FvsOtIHDFmWVkIHpha291cGVuw61tIHByb2hsw6lkbm91dC
BhIHZ5emtvdcWhZXQuIDxicj4KS291cMOtIHpib8W+w60ga3VwdWrDrWPDrSBha2NlcHR1amUgamVobyBzdGF2IGEga3ZhbGl0d
S48YnI+Cjxicj4KQ2VuYTogICAgNTA8YnI+Cjxicj4KPC9ibG9ja3F1b3RlPjwvZGl2PjwvZGl2Pg==
-----------------------56cd6ec8c705394f1d1182bcc737b0aa--
-----------------------1869830f9274e5fbca1ed10844258a26--

Try searching for: Radeon, X300SE, Hyper, Memory

Message cannot be found.

Fulltext search does not work properly.

Three months I am trying support to do something with it.

exander77 commented 2 years ago

So, after months, I finally got the support to investigate and answer me:

Regarding the behavior that you are experiencing, please note that this is currently expected while searching for the message content. We only keep the quoted content for forwarded messages, but not for the replied messages. The design is to reduce the redundancy in the local database in case one message has many replies. This is also why you were able to search for the forwarded message earlier.

We hope this can clarify the concern you have and we are sorry if any inconvenience caused regarding this.

That is definitely not expected behavior: https://proton.me/support/search

The main issue is that it breaks search is a major major way.

The quoted content is not only a content that is replied from previous message - that would be ok, if the original message would be found (redundancy as you say). But this covers 90% of e-mails where the original message was initiated in some system. Support forms, helpdesks, various craigslists, even maillists.

Not only that, this covers cases where you are not the recipient of the original e-mail, but you were added to the conversation (not forwarded) after it started. This covers like at least 50% of all work conversation where you are added to an existing discussions. And this is clusterfuck. You are unable to find the thread by information contained inside the messages that were exchanged before you were part of the conversation and are part of the quoted responses.

It isn't redundancy when the content is not indexed at least once within the thread.

I am basically unable to located half of my messages.

This is yet again completely incompetent design and implementation.

Krovikan-Vamp commented 2 years ago

In case you didn't read; there is only a group of like 10 people working on this for YOUR security benefit. Why don't you try fixing the issue (#277) and making a PR :D

exander77 commented 2 years ago

@Krovikan-Vamp I already made PR once for a feature in the original proton-mail repository, which is now archived: https://github.com/ProtonMail/proton-mail/pull/56/files

You can see, how it went.

exander77 commented 2 years ago

It is better for me to make pull request to TrueNAS where they are actually valued: https://github.com/truenas/middleware/pull/8720 And improved upon: https://github.com/truenas/middleware/pull/9022

This is how I imagine interaction with community. I advise you to look into it.

bartbutler commented 2 years ago

You make a good point about the quoted content often not being redundant, we may need to take another look at this.

exander77 commented 2 years ago

@bartbutler It also has other nuances. People do alter quoted content sometimes.

Some thoughts:

If I were to implement this efficiently, I would strip white spaces (just to be safe) from each quoted text (blockquote) and hashed it and I would only skip indexing if, within the same e-mail thread, the hash of the quote was already seen.

This has some nuances as well - like e-mail deletion - as you need to change the original occurrence. I would link the redundant quote to the original quote occurrence. Then, if the e-mail with the original quote was deleted, the next e-mail with the same quote would take its place as the original occurrence. If it was the last reference, the data could be purged from the index altogether. I am not sure if this would be viable within current implementation, but you can just trigger reindexing for all e-mails that follow within the same thread after e-mail is deleted. That would be crude, but would work. The problem would be generally be if e-mail indexing is not thread aware. If that's the case, then quote index could be global and not per thread. You can also index all quotes less than certain number of words to handle quoted signatures etc. - like always index quotes with 5 or fewer words.

exander77 commented 2 years ago

@bartbutler Found another case, not related to the previous one.

No parts of the message were indexed, I again shrank the example and headers. No words even without quoted-printable escape sequences can be found, including 21046018, zplnomocnil...

From: example@example.com
Date: Fri, 28 Jan 2022 23:55:19 +0100
Subject: Sample
To: example@example.com
Content-Type: multipart/mixed;boundary=---------------------6f3607ffe276966b99274a28f25c750f

-----------------------6f3607ffe276966b99274a28f25c750f
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;charset=utf-8

Dobr=C3=BD den, pane in=C5=BEen=C3=BDre,

reaguji na V=C3=A1=C5=A1 dotaz ohledn=C4=9B pr=C5=AFb=C4=9Bhu reklamace ID=
 21046018. Dne 22. 06. 2021 V=C3=A1m byl odesl=C3=A1n v=C3=BDsledek reklam=
ace s t=C3=ADm, abyste se dostavil na pobo=C4=8Dku =C4=8Cesk=C3=A9 po=C5=A1=
ty k seps=C3=A1n=C3=AD formul=C3=A1=C5=99e o podm=C3=ADne=C4=8Dn=C3=A9 n=C3=
=A1hrad=C4=9B =C5=A1kody a prohl=C3=A1=C5=A1en=C3=AD n=C3=A1hrady =C5=A1=
kody, z d=C5=AFvodu nevy=C5=99=C3=ADzen=C3=AD reklamace zahrani=C4=8Dn=C3=AD=
m oper=C3=A1torem ve stanoven=C3=A9 lh=C5=AFt=C4=9B. V=C4=8Dera tj. 29. =
06. 2021 n=C3=A1s zahrani=C4=8Dn=C3=AD oper=C3=A1tor informoval o ztr=C3=A1=
t=C4=9B Va=C5=A1=C3=AD z=C3=A1silky a zplnomocnil n=C3=A1s k Va=C5=A1emu =
od=C5=A1kodn=C4=9Bn=C3=AD. Pros=C3=ADm V=C3=A1s, abyste se dostavil na Va=C5=
=A1=C3=AD reklama=C4=8Dn=C3=AD po=C5=A1tu s seps=C3=A1n=C3=AD formul=C3=A1=
=C5=99e prohl=C3=A1=C5=A1en=C3=AD n=C3=A1hrady =C5=A1kody a dolo=C5=BEen=
=C3=AD hodnoty obsahu z=C3=A1silky.

Omlouv=C3=A1me se za komplikace.

S pozdravem
-----------------------6f3607ffe276966b99274a28f25c750f--
exander77 commented 2 years ago

The attachment names are not indexed, e-mails cannot be located by attachment name. Like seriously?