ProtonMail / proton-bridge

Proton Mail Bridge application
GNU General Public License v3.0
1.14k stars 155 forks source link

X-Origin-Date wrong #357

Closed jknockaert closed 1 year ago

jknockaert commented 1 year ago

So I had a bunch of emails imported using Easy Switch, and then when I ran a python script using Bridge to compare the original to the imported messages I noticed some weird behaviour happening with the Date headers. So upon import some original headers are modified. The content of the original field should be saved in a new header with a X-Original- prefix. But then when using Bridge the corresponding header values returned are nonsense. As an example I use a message I received on 15 January 2023 at 4:23:03. In the original message the corresponding header field is:

Date: Sun, 15 Jan 2023 04:23:03 +0100 (W. Europe Standard Time)

After Easy Switch this becomes in the web client:

X-Original-Date: Sun, 15 Jan 2023 04:23:03 +0100 (W. Europe Standard Time)
Date: 15-Jan-2023 04:23:13 +0100

(Not sure why the parsed Date header is 10s off, but that's an issue with Easy Switch which I am not discussing here.) Fetching this issue over Bridge the headers become:

X-Pm-Date: Sun, 15 Jan 2023 03:23:13 +0000
Date: Sun, 15 Jan 2023 03:23:13 +0000
X-Original-Date: Mon, 01 Jan 0001 00:00:00 +0000

Not sure why X-Pm-Date was added in this context, but the X-Original-Date header is definitely not quite right.

Expected Behavior

Bridge should return exactly the same value for the X-Original-Date header as the web client.

Current Behavior

Bridge returns Mon, 01 Jan 0001 00:00:00 +0000 for the X-Original-Date header for messages imported with Easy Switch.

Steps to Reproduce

Am not exactly sure, but as far as I can see this happens to all imported messages.

Version Information

web client: 5.0.17.12 (on Firefox 110.0.1) bridge: 3.0.19 (on macOS 13.2.1)

LBeernaertProton commented 1 year ago

Hey @jknockaert . Thanks for the report we will investigate on our end.

jknockaert commented 1 year ago

OK here's my best guess of what is happening. Upon import the Easy Switch algorithm validates the Date header and if it decides that it is not properly formatted (as per RFC3501?) it stores the original value to a new 'X-Original-Date' header and reformats the Date header. So far so good (but it would be nice to have some more documentation on the Easy Switch algorithm). Then when the Bridge returns the messages it seems to try to evaluate the X-Original-Date header, and failing to do so it returns a default value, whereas it should just return the header without any further processing.

LBeernaertProton commented 1 year ago

The issue has been reproduced and has been internally tracked as GODT-2480.

primalmotion commented 1 year ago

I'm wondering if this could cause a bug in Geary where I say to only sync the last week, but it ended up syncing absolutely everything

LBeernaertProton commented 1 year ago

hey @primalmotion , sometimes the proton backend can send a refresh event to your account and in that case we need to resync your entire state. It most likely what happened.

primalmotion commented 1 year ago

@LBeernaertProton I can understand the bridge resyncs the entire state, but not sure why this makes the Geary setting to not sync mails older than a week to be synced. I'm on a phone with not so much disk space. proton-bridge cache is about 2GB, and normally Geary cache is about a few MB. But this made my geary cache to use roughly 2GB too

LBeernaertProton commented 1 year ago

@primalmotion Unfortunately I also don't know what might be causing that issue with that specific email client.

We have patched this bug and we will release it as soon as we are able. I would recommend trying again once that release is out and seeing if the issue is resolved. I will post to the thread once it is ready.

jknockaert commented 1 year ago

(Not sure why the parsed Date header is 10s off, but that's an issue with Easy Switch which I am not discussing here.)

Just in case anyone wondered: for some reason Easy Switch is using the INTERNALDATE timestamp here, likely because it failed to parse the original Date header for some unknown reason. Not sure if and where issues with Easy Switch should be reported.

primalmotion commented 1 year ago

I would recommend trying again once that release is out and seeing if the issue is resolved

Will do :) thanks!

LBeernaertProton commented 1 year ago

The fix for this issue has been pushed into the v3 branch.

DISCLAIMER: This is a mirror of our internal development branch and has not yet been fully tested and vetted.

An official release will be made as soon as we are ready.

andrew-s commented 1 year ago

@LBeernaertProton not to hijack this issue but, I have something incredibly similar - it looks like the ones I imported with the bridge a few years ago aren't getting the correct date (as in, they're getting the date they went into proton bridge (latest version));

I couldn't find it in the code but I assume the Date field is being taken, for my imported emails that would be;

Date: Sat, 17 Nov 2018 23:33:24 UT

The issue here is two fold, either the original provider or the bridge used UT instead of UTC and it doesn't look like time.Parse() handles that with RFC1123 layout (it also doesn't match the RFC1123Z layout either. I'm going based off the input into the tests; https://github.com/ProtonMail/proton-bridge/blob/7be46a47402be596a8a1f8925c09ef26f9445fc3/pkg/message/build_framework_test.go#LL60C46-L60C59 )

As a result, time.Parse() produces; 0001-01-01 00:00:00 +0000 UTC ( https://go.dev/play/p/Sj7aR-Qd9wa )

Would the upcoming change also fix this? or would this format still be problematic? Happy to provide further examples

LBeernaertProton commented 1 year ago

Hey @andrew-s, it will really depend on how long ago those messages were imported.

I did a quick check and v3 has support for the UT timezone notation. I think some of v2 may support that as well.

The only way to address this would be to re-import that specific message on a more recent version of Bridge

andrew-s commented 1 year ago

Thanks for the quick response @LBeernaertProton - these are emails from 2019 and for years previous - it's impossible to reimport them as those accounts are long gone (that they were imported from) and with an inbox of >350k messages any operation takes time.

Is it possible to handle this during sync in the bridge? I feel this would be a better solution for those earlier (v1 or so bridge imports) - happy to provide any implementation work if pointed in the right direction

LBeernaertProton commented 1 year ago

@andrew-s I can't say for certain it depends on the what data was sent to the servers at that point. Can you provide all the date related headers (see list below) for one of these messages ?

andrew-s commented 1 year ago

I went through this one specifically and the headers are;

Date: Sat, 17 Nov 2018 23:33:24 UT

There are no X-Original-Date or X-Pm-Date headers - I assume the X-Pm-* headers are added during import?

LBeernaertProton commented 1 year ago

I'm a bit confused at the moment.

If these message were imported I would expect to see at least an X-Pm-Date header. If there was a parsing issue with the message date, the original value is written to the X-Original-Date header and Date is replaced with a default value.

The main issue for this thread was that we were overriding existing X-Original-Date values rather than keeping the existing value should there be one.

The code you pointed to is test code that has no bearing on our date parsing. That's to initialize our tests' message date.

Are you viewing the date for these messages in an IMAP client? As far as I can tell, this date is parsed correctly by our date parsers.

andrew-s commented 1 year ago

I'm guessing this was v1 of the bridge - it sounds like it was buggy, this is definitely an imported email as it's labeled Imported Sep-19-2020 06:34 and there are other headers;

X-Pm-Origin: import
X-Pm-Content-Encryption: on-import
X-Pm-Spamscore: 0
X-Pm-Spam-Action: dunno
X-Original-To: {redacted}

But no X-Pm-date field, there were other errors during sync regarding invalid address fields so, I'll raise that one separately - I'm also viewing this in the web client -> 'Message Headers'

Edit: Interestingly, if I search for a period that should include this email - it doesn't appear in the web client, I have to search the title or another attribute so it seems like this is broken in the web search too (likely it didn't import correctly?)

LBeernaertProton commented 1 year ago

@andrew-s I'm afraid that there is no other way to fix this other than to re-import the messages, we can't update the existing metadata attached to the message made during a previous import. Based on the date field you provided, it should display fine with the current Web and Bridge clients.

Regarding the error messages about failed to parse address, those addresses do not follow the the RFC5322 specification. You can ignore those.

andrew-s commented 1 year ago

@LBeernaertProton I've already mentioned that would be impossible, those inboxes are from almost 3 years ago - they're long gone. I'm confused as to why this can't be fixed in the bridge? when the bridge initially caused this problem (and it absolutely does not work correctly in web either as you can't filter by date on these messages).

LBeernaertProton commented 1 year ago

@andrew-s There is no API on the backend for us to update the dates of existing messages, nor is there a way to do so via the IMAP protocol. We simply can't make these changes. You can try to contact costumer support and ask for assistance in that regard as this outside of the scope of things we can address in the bridge.

But to confirm. The date displays correctly on the web and the IMAP client? The only thing you can't seem to do on the web is search in that given interval?

jknockaert commented 1 year ago

Is this fix included in 3.0.21? I didn't see mention of GODT-2480 in the release notes...

LBeernaertProton commented 1 year ago

I'm afraid this fix will be released in a future, hopefully next, update.

jknockaert commented 1 year ago

Further exploring the handling of date headers I see more weirdness happening. As an example a message that has the following Date header on the original IMAP server:

date: Thu,  8 May 2003 00:27:30 METDST

and INTERNALDATE is 08-May-2003 02:27:30 +0200. After Easy Switch this becomes in the webclient:

X-Original-Date: Thu,
  8 May 2003 00:27:30 METDST
Date: Thu,
  8 May 2003 00:27:30 METDST

Easy Switch (correctly) identified that the original Date header didn't follow the spec, copied it to the X-Original-Date header but for some reason did not update the Date header to a default value (assuming the webclient doesn't do its own parsing). The webclient doesn't give the X-Pm-Date header. Now moving to the bridge (I updated to 3.0.21) this is what I get:

X-Pm-Date: Thu, 01 Jan 1970 00:00:00 +0000
X-Original-Date: Mon, 01 Jan 0001 00:00:00 +0000
Date: Fri, 13 Aug 1982 00:00:00 +0000

That's quite funny but I don't think it's what it should be.

LBeernaertProton commented 1 year ago

A fix for this issue is present in version 3.1 which is now in pre-release.

jknockaert commented 1 year ago

OK the new version 3.1.3 solves the issue with X-Original-Date. The issue reported above where Date is different between the webclient and Bridge still stands.

LBeernaertProton commented 1 year ago

@jknockaert We had issues in the past with Apple Mail where if we did not present a valid date it could cause the client to crash.

jknockaert commented 1 year ago

@LBeernaertProton Alright then I guess we can close this one.