Closed exander77 closed 3 years ago
Hi @exander77, thanks for the report.
Two things:
References
header entry which you provided actually exceeded the maximum length permitted by textproto
and the test gave an error textproto: length limit exceeded: line
. This should have led to an error during import and we should have given up, rather than mangling header/body.References
entries to get the line length below 4096, nothing was mangled and everything was imported fine. Specifically, the entries I had to remove to get the line length below 4096 were the 3 addresses that were wrongly moved to the body during import in your exampleThis makes me think it's an issue related to the line length going above 4096, as it seems we stop processing the line there and think the header has ended. However, I don't understand why we didn't abort the import because of the error and instead actually continued with the import.
Which version of the import-export app did you use for this import?
This occurred in the batch of messages I did 25th November 2020. I always use the latest version from Git. This was I thing after you released the new parser. I have remigrated everything.
The messages have artificial Date
with the time of the migration and Subject
(No Subject)
as the real Date
and Subject
usually ended in a body.
I can rerun whole migration again with the latest version from git if there is any change. I wonder what could have changed in the meantime or why your test does not trigger it and real migration does.
But as you needed to remove exactly those 3 addresses that ended up in the body, We can pretty much be certain that is not a coincidence.
I thought that maybe we weren't handling the error properly during actual importing (as opposed to just reproducing in a unit test), but I tried running an actual import on a message with your provided long References
line and we successfully rejected it. The behaviour was the same on the latest master
as well as an older build from september.
Were you importing from local files or from imap? Perhaps imap introduces some linewrapping at 4096 chars (we had a bug related to that limit) which screws up the header parsing, and this linewrapping is not evident in the References
header entry you provided in your original message.
@jameshoulahan I imported from Gmail IMAP, I do dry runs on mbox file, but I do real migrations on IMAP.
Importing the message from Gmail IMAP with the latest version of I-E successfully catches the length limit exceeded
error and the message is not imported. I will try with older versions of I-E as I recall something changed with regards to long lines over IMAP.
Update: even with an old september build, importing via IMAP from Gmail successfully caught the long line error and didn't import the message.
At this point, I'd ask you @exander77 to try to import the offending message again and see if you can reproduce the issue, because I'm struggling to.
@jameshoulahan I have located the e-mail on Gmail, I can send it to you unmodified so you can try to replicate on exact message. Where can I send it to you? There is not much private going on, but it is a part of business communication.
The problem may be that there is actually a new line after each reference in the source Gmail. They were merged to a single line on ProtonMail.
Source message:
References: <CAEDgNLwuNwXvTaqJY87S-rjUUEjCW82HvZ3Ys5=SmZMYx5dNOg@mail.gmail.com>
<CAEDgNLxuXg2+cr8edkoAxYPAo6h=vZ_PFQr7Rv4w2NszvXZNrg@mail.gmail.com>
<CAGBNNB246ScrXLLh3-Sw-M57Kqk+FD4tNwyL2_m0cv=3MxJ7hQ@mail.gmail.com>
<CAEDgNLzSdzXWO+9BmDTV+BW8dz2dyyhaVgW4ApQ-3UpcDDAGPg@mail.gmail.com>
<CAGBNNB0UB+t0Q8ebSDQKK36BEaB_XcHJ7VEC6OeQwWZtPLKGbQ@mail.gmail.com>
<CAEDgNLy0_PUpx2qxnnraCKa8Pv2Q3W+qAsbP23d3pcDKnBogcA@mail.gmail.com>
<CAEDgNLxTvQsjQfvyjyRJ=NJ4TWs8a1QU0Du6qP1L1qKEgP37_Q@mail.gmail.com>
<CAGBNNB1djxTa+Gxg3TgK41Wj9yFC_e=_wiW3+Vc0JE=EfR8hAA@mail.gmail.com>
<CAGBNNB0xDE3m7f4WO69yDq=g2AhAQKYu1RPkYhWG239G-Cd2gQ@mail.gmail.com>
<CAEDgNLxEhuGyMd=MLLe3qrLCM2nkToc8H9KC4FzkrwP35NS=6A@mail.gmail.com>
<CAGBNNB0Cmj5cUaKyh_dzSmtWVcJMTo09+HTBxcYKD+Hnw1gkGw@mail.gmail.com>
<CAEDgNLyCNj+vhsjYfMtOFDy9=zqU3Xn7AYcBK_r4mc=NFbV-nA@mail.gmail.com>
<CAGBNNB3nuE_C9_v5B55OwBYdndvWdN5fdngkqE=F1YfZUv5E_w@mail.gmail.com>
<CAEDgNLxbmWiMLxFowY--vL7pxpWBT+mXYdp52edLTqmO9SPu=w@mail.gmail.com>
<CAGBNNB0DuNzqh20a2mDV=i6+CeWVCXDMqjLmgEXn4cHC0Y+hsw@mail.gmail.com>
<CAEDgNLzYP897rZXhD_GZKtUX6y0o-gFQdnzqodhhYycN3N-s3A@mail.gmail.com>
<CAGBNNB2n71HYV90N8t7W1by3eUB4Yf11mzzn=vBrrV2OfPkdYQ@mail.gmail.com>
<CAEDgNLzmUBC1z9dyH07nbuBz+dciE46A8K280C9EtH2uhe+W7A@mail.gmail.com>
<CAGBNNB1kms17n7SxFnRZsL82Yf-DiQpPLrRUjH9fhOnKZVnVDA@mail.gmail.com>
<CAEDgNLx+onZzGfoKEEm9W5NCpcdmkWPwZ5o+bcP5wrsOnt3okA@mail.gmail.com>
<CAGBNNB3meV-xh_rZW0Y6db5m33tN3vYeNJbPqCubaNr74RT8Rg@mail.gmail.com>
<CAEDgNLwjA2W_5Ea2YbeyMp8hxKAoKLJppLmb1kkf0XitMt7Tng@mail.gmail.com>
<CAGBNNB11q6EzXSXnYmRrkOkN-JB5452ZkWesaOX-qMacD56Seg@mail.gmail.com>
<CAEDgNLy_kT==SxnF96+H8gBXWGHi53rmDtDOy2DpO0bdhnTRKQ@mail.gmail.com>
<CAEDgNLyT01oNkfuC1_dQ0OXsxEMFuAUcLLFeZXsuQuchnwt-Cg@mail.gmail.com>
<CAGBNNB1sPrRhJxUT+DbsZ2cBhsMAyu77yzYxqZqAmfR_bJ8dKA@mail.gmail.com>
<CAEDgNLyZ18yCwbBzx3vtEMv1FpKkuaeWBCkQc89uPm4E42Wszg@mail.gmail.com>
<CAGBNNB3HoAfvRggkWOafKUyBFyjS4iTSjcqd+A05AH=Zwj3d6Q@mail.gmail.com>
<CAEDgNLw+h8B_RRiAjgoyyykGhjNN1xgcqrENo=ac4Nd6v1MCVw@mail.gmail.com>
<CAGBNNB3sxrd=jqdkpGa1OVu7vvAQ3U2mqfJ1JKvCyS9dL1UYXw@mail.gmail.com>
<CAEDgNLwzqBJO2cC5+kwUAigQHYt4iOwfgN=us+_y8+8kw9GAfQ@mail.gmail.com>
<CAGBNNB3JV3aoJjk=OeF7WT9AdN_eXVj0En+QUcvk79u9cJCYCw@mail.gmail.com>
<CAEDgNLzGVs6Dwm__eEEVphXxDSu1r-dGEmt5hN9wsQFZK5G4YQ@mail.gmail.com>
<CAGBNNB3TRqg3eAV8vW0o4O1fpvjaLvki6jXwf7M6PRxx=TgkCA@mail.gmail.com>
<CAEDgNLzdXmCqtByTEVTOX6Qv=tKWtSb7nmRFN1skgeyN9B7-Ow@mail.gmail.com>
<CAGBNNB1D1Rj0cgCmMbTcQ3y2_XAWQ77ssCec2aftEL4kDH0xZQ@mail.gmail.com>
<CAEDgNLy8m3Otbwj30HLusdo6NafkRmbP+s-N=KO2R8u8dNiUDw@mail.gmail.com>
<CAGBNNB3Dx3Yy9sUkjJg9NLA5UU6_CkPwVHv2KK0QtVxYi+Wc_Q@mail.gmail.com>
<CAEDgNLw6zuSwFA93MvwUAfWYt5Yzspmqhwo4DWh3D8W35hmtVQ@mail.gmail.com>
<CAEDgNLzGACcgKzt_5UwK+wXi9nKaXbSpf8hCfo9GYUun=pwOnQ@mail.gmail.com>
<CAGBNNB1mt-Vq+SDocTeZOuV4xeEGA4s6Moc2bXU0E--a9xdx-Q@mail.gmail.com>
<CAEDgNLww3ugfgBNg=z5uNs_jEH-vxNDtU7T=w1iWiEM4ksYSPw@mail.gmail.com>
<CAGBNNB2zax2LY+0JFRfOfCRSsPAswrJnvUCvw=5zaNKngR7G-A@mail.gmail.com>
<CAGBNNB07R6sARdKyEBJvDQx=koQYdSibWGCH5F074APsthfjYA@mail.gmail.com>
<CAEDgNLw0xt4oMEHvZiFudoAzS5h=ws+DFjAbh+q3BmTfB8XD8Q@mail.gmail.com>
<CAGBNNB2wZW7Y2vzYvq-_aa_uFiJcdUUd=mpUUWPPeQVG65iO=g@mail.gmail.com>
<CAGBNNB0n0uLx+iyBrEs_zXbFMWfn6Uhb7G2CMXOxwc9ZT2dgoA@mail.gmail.com>
<CAEDgNLyEFa0dQ8moj-B-pPghRnkDGeuNMj3Wx+d9YAXz-fWpsw@mail.gmail.com>
<CAGBNNB3UW2k2USfpVEpABUC4T0R1raqbKYF=XZAmqzHaQV_hew@mail.gmail.com>
<CAGBNNB0+ohvyLKJHcgZ91GoYbBPxJRHoYVfh1Mr8fE7kN=EfwQ@mail.gmail.com>
<CAEDgNLyb1JjqDBB6+1UsiSAC3-nB0nqzc9Kbfzn-dewp5A+1Pw@mail.gmail.com>
<CAEDgNLximDkaPrQYAtKxsS4ZCNF_sRfqU9RrfJLhPK764GsKzA@mail.gmail.com>
<CAGBNNB0Y_1=sRROa16i7kQOhisCLaVyo6GYfoEwZcaCkWQujNQ@mail.gmail.com>
<CAEDgNLzzKw2LPvYuftj9xfqmakBx4o+pxT_Ga7Efcd9eMA=BVw@mail.gmail.com>
<CAGBNNB03AuJK3GAgvyCofUAgEt8O815bdQk3M1Ny+JDXE2+wGQ@mail.gmail.com>
<CAEDgNLwWRoQyMx-kMiMULZmVwiin=-ZYL6bO0NvPh0N01Lh22w@mail.gmail.com>
<CAGBNNB0ytPt0vhT=pJROmub8J4NrzVOTnzDgNnOi_x5nOaGdvA@mail.gmail.com>
<CAEDgNLwbcJ1x1DnYkr6Nv69KBGPwRYjLJixCrMPhKDqOhpS8Ug@mail.gmail.com>
<CAGBNNB0qwwUss4BKT7X_1j1zPmdC0paMKQVzwGD1yCiOTz-Vqg@mail.gmail.com>
<CAEDgNLwhm8E1Vg1kJLR6GjcckPNx-Em8xP8Bx9pSbLL2rYyG-A@mail.gmail.com>
Date: Wed, 13 Apr 2016 14:06:11 +0200
Aha! Yes, with a newline after each reference, the textproto lib doesn't return an error due to an overly long line. Instead, it pushes the last three references to the body.
I'll try make a fix. Thanks!
Update: the issue occurs entirely outside of the bridge -- the references are pushed into the body when we call go-message's message.Read
function. I'll open an issue upstream.
I bet that the other issue with recipients is the same problem. When it will be fixed I will remigrate my messages again.
The issue was actually fixed upstream just a few days before I opened the bug report there. Will bump the go-message dependency and we can try again.
@jameshoulahan Thank's for the info! I will retest everything after it has been propagated in the Bridge.
Hi @exander77, the 1.5.6 release includes the bumped go-message dependency; would appreciate if you can retest.
Edit: oops, you're probably waiting for the next I-E release -- that should come out soon as well. But the code doing the import should be shared across both apps so if it works on Bridge, it will most probably also work on I-E.
@jameshoulahan I have been able to run it again, but something seems off, I had around 6,4GB of e-mails, I removed the imported ones. That left me with around 1GB. After the import, I now have around 5GB.
Were there any improvements in storing e-mails? I have already reported, that Proton-Bridge inflates the size of e-mails by base64 encoding by around 15-20%. I could account for the change with that, but otherwise, I would think there has to be something missing.
@exander77, to my knowledge nothing has changed with respect to message size/storage. I cannot explain the apparent difference in storage size. Which release version did you re-test with?
Overly long lines are now handled properly (both Bridge and the I-E app). We looked into inflated size of messages a few times now but didn't find anything wrong there. @exander77 if that's still an issue, let us know.
If a message contains a chain of referenced messages in headers:
Then corruption occurs and the parser will make part of the headers a part of the body, the body starts with:
Snipped of header and body of the new message:
This affects a large number of messages, in my case around 500.
@jameshoulahan
Already reported similar occurrence happening on the server: https://github.com/ProtonMail/proton-bridge/issues/27