Open chris001 opened 9 years ago
Yes, @ezyang 's posts are well known.
The algorithm is not the reason why I intend a deep refactoring. The algo in OfflineIMAP is good. It's more the implementation that badly supported time and libraries not Python3 compatible.
isync, mailsync and imapsync are known tools from the community.
imapsync does not have a sync algo and doesn't support 2-way sync, so it's out of context.
isync and mailsync are good tools while they both have way less advanced features than OfflineIMAP.
Thanks for the input, though. I appreciate. :-)
One algo that needs improving is the algo that determines message unique identity.
Currently using only the IMAP UID, which is not a dependable indicator.
The IMAP server can reset the UIDs at any time.
Best thing to improve quality of the results would be to implement a message unique identification algorithm that includes a combination of optional data. UID, In the headers, Message-ID, References, In-Reply-To. Store these data in a row in the sqlite database table.
That should be enough of an improvement to uniquely identify messages without having to download the entiree message, This would reduce the amount of bandwidth consumed, and speed up the sync.
On Thu, Apr 16, 2015 at 10:50:00AM -0700, Chris Coleman wrote:
One algo that needs improving is the algo that determines message unique identity.
Currently using only the IMAP UID, which is not a dependable indicator.
Actually, I didn't have any UID problem for years.
The IMAP server can reset the UIDs at any time.
That's why IMAP provides UIDVALIDITY.
Best thing to improve quality of the results would be to implement a message unique identification algorithm that includes a combination of optional data. UID, In the headers, Message-ID, References, In-Reply-To. Store these data in a row in the sqlite database table.
That should be enough of an improvement to uniquely identify messages without having to download the entiree message, This would reduce the amount of bandwidth consumed, and speed up the sync.
This would increase both bandwidth usage and time of a sync. That's why I disagree with you, here. UID are just fine most of the time.
Nicolas Sebrecht
On Fri, Apr 17, 2015 at 12:41:35AM +0200, Nicolas Sebrecht wrote:
Best thing to improve quality of the results would be to implement a message unique identification algorithm that includes a combination of optional data. UID, In the headers, Message-ID, References, In-Reply-To. Store these data in a row in the sqlite database table.
That should be enough of an improvement to uniquely identify messages without having to download the entiree message, This would reduce the amount of bandwidth consumed, and speed up the sync.
I think I'm getting why you think it could improve speed. I'm re-opening because it could worth some basic tests with time measures to compare.
Something very simple could do it with raw IMAP requests:
Would you mind setting up such speed tests?
Nicolas Sebrecht
OK I set up a local speed test between, local machine, and remote vps server running dovecot (800km distance).
Time to read first UID : 2400 ms (but the server was under load from other web services running on it) Time to read subsequent UIDs : instantaneous. This is natural because the file containing the table which relates the UID to the message filename is not enormous, 1KB - 50MB, and is buffered in memory by operating system optimizations.
Time to lookup message based on UID : typically 5-10ms + roundtrip network time. Time to lookup message based on Message-ID.a bit slower, roundtrip network time + 10-25ms, depends if the information has been brought into disk cache.
Nice. With Message-ID check in the run when UIDVALIDITY changed for a mailbox with say 1000 mails, this would mean:
Of course, adding Message-ID only is not enough to get a full valid matching with what we have locally: it's possible to have more than one mail with the same Message-ID header and mails without a Message-ID at all. For those, a full re-download might be done.
If we have a few 90% matching, we are saving 900 re-download. This sounds quite good. There would be local time overhead to do all the checks with the local maildir but this would still be good.
This is good optimization for UIDVALIDITY changes.
@nicolas33
isync: http://isync.sourceforge.net/
mailsync: http://mailsync.sourceforge.net/
imapsync: http://imapsync.lamiral.info/