Open exander77 opened 2 years ago
So, after months, I finally got the support to investigate and answer me:
Regarding the behavior that you are experiencing, please note that this is currently expected while searching for the message content. We only keep the quoted content for forwarded messages, but not for the replied messages. The design is to reduce the redundancy in the local database in case one message has many replies. This is also why you were able to search for the forwarded message earlier.
We hope this can clarify the concern you have and we are sorry if any inconvenience caused regarding this.
That is definitely not expected behavior: https://proton.me/support/search
The main issue is that it breaks search is a major major way.
The quoted content is not only a content that is replied from previous message - that would be ok, if the original message would be found (redundancy as you say). But this covers 90% of e-mails where the original message was initiated in some system. Support forms, helpdesks, various craigslists, even maillists.
Not only that, this covers cases where you are not the recipient of the original e-mail, but you were added to the conversation (not forwarded) after it started. This covers like at least 50% of all work conversation where you are added to an existing discussions. And this is clusterfuck. You are unable to find the thread by information contained inside the messages that were exchanged before you were part of the conversation and are part of the quoted responses.
It isn't redundancy when the content is not indexed at least once within the thread.
I am basically unable to located half of my messages.
This is yet again completely incompetent design and implementation.
In case you didn't read; there is only a group of like 10 people working on this for YOUR security benefit. Why don't you try fixing the issue (#277) and making a PR :D
@Krovikan-Vamp I already made PR once for a feature in the original proton-mail
repository, which is now archived:
https://github.com/ProtonMail/proton-mail/pull/56/files
You can see, how it went.
It is better for me to make pull request to TrueNAS where they are actually valued: https://github.com/truenas/middleware/pull/8720 And improved upon: https://github.com/truenas/middleware/pull/9022
This is how I imagine interaction with community. I advise you to look into it.
You make a good point about the quoted content often not being redundant, we may need to take another look at this.
@bartbutler It also has other nuances. People do alter quoted content sometimes.
Some thoughts:
If I were to implement this efficiently, I would strip white spaces (just to be safe) from each quoted text (blockquote) and hashed it and I would only skip indexing if, within the same e-mail thread, the hash of the quote was already seen.
This has some nuances as well - like e-mail deletion - as you need to change the original occurrence. I would link the redundant quote to the original quote occurrence. Then, if the e-mail with the original quote was deleted, the next e-mail with the same quote would take its place as the original occurrence. If it was the last reference, the data could be purged from the index altogether. I am not sure if this would be viable within current implementation, but you can just trigger reindexing for all e-mails that follow within the same thread after e-mail is deleted. That would be crude, but would work. The problem would be generally be if e-mail indexing is not thread aware. If that's the case, then quote index could be global and not per thread. You can also index all quotes less than certain number of words to handle quoted signatures etc. - like always index quotes with 5 or fewer words.
@bartbutler Found another case, not related to the previous one.
No parts of the message were indexed, I again shrank the example and headers. No words even without quoted-printable
escape sequences can be found, including 21046018
, zplnomocnil
...
From: example@example.com
Date: Fri, 28 Jan 2022 23:55:19 +0100
Subject: Sample
To: example@example.com
Content-Type: multipart/mixed;boundary=---------------------6f3607ffe276966b99274a28f25c750f
-----------------------6f3607ffe276966b99274a28f25c750f
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;charset=utf-8
Dobr=C3=BD den, pane in=C5=BEen=C3=BDre,
reaguji na V=C3=A1=C5=A1 dotaz ohledn=C4=9B pr=C5=AFb=C4=9Bhu reklamace ID=
21046018. Dne 22. 06. 2021 V=C3=A1m byl odesl=C3=A1n v=C3=BDsledek reklam=
ace s t=C3=ADm, abyste se dostavil na pobo=C4=8Dku =C4=8Cesk=C3=A9 po=C5=A1=
ty k seps=C3=A1n=C3=AD formul=C3=A1=C5=99e o podm=C3=ADne=C4=8Dn=C3=A9 n=C3=
=A1hrad=C4=9B =C5=A1kody a prohl=C3=A1=C5=A1en=C3=AD n=C3=A1hrady =C5=A1=
kody, z d=C5=AFvodu nevy=C5=99=C3=ADzen=C3=AD reklamace zahrani=C4=8Dn=C3=AD=
m oper=C3=A1torem ve stanoven=C3=A9 lh=C5=AFt=C4=9B. V=C4=8Dera tj. 29. =
06. 2021 n=C3=A1s zahrani=C4=8Dn=C3=AD oper=C3=A1tor informoval o ztr=C3=A1=
t=C4=9B Va=C5=A1=C3=AD z=C3=A1silky a zplnomocnil n=C3=A1s k Va=C5=A1emu =
od=C5=A1kodn=C4=9Bn=C3=AD. Pros=C3=ADm V=C3=A1s, abyste se dostavil na Va=C5=
=A1=C3=AD reklama=C4=8Dn=C3=AD po=C5=A1tu s seps=C3=A1n=C3=AD formul=C3=A1=
=C5=99e prohl=C3=A1=C5=A1en=C3=AD n=C3=A1hrady =C5=A1kody a dolo=C5=BEen=
=C3=AD hodnoty obsahu z=C3=A1silky.
Omlouv=C3=A1me se za komplikace.
S pozdravem
-----------------------6f3607ffe276966b99274a28f25c750f--
The attachment names are not indexed, e-mails cannot be located by attachment name. Like seriously?
Instead of passing my reproducible cases to development, I have to go back and forth 20 rounds and ProtonMail isn't able to even verify that the issue exist.
Here you have it, receive this small example:
Try searching for:
Radeon
,X300SE
,Hyper
,Memory
Message cannot be found.
Fulltext search does not work properly.
Three months I am trying support to do something with it.