Closed manishrjain closed 6 years ago
The mail that should be synced could perhaps be limited by speifying a query (https://github.com/gauteh/gmailieer/blob/master/lieer/remote.py#L64). The problem is that e.g. history().list(..) does not take a query, so things would get messy when doing a partial update. There are probably other methods that should be limited to the query as well.
Note that once you have all your emails synced the partial update will be fast. Is this in order to save disk space?
-g
Manish R Jain writes on september 9, 2017 8:21:
My personal email goes back over 10 years, and I'd rather not have all of that downloaded on my laptop (even setting the max messages per label doesn't help with that, because I have gained way too many labels over the years). offlineimap allows you to sync mails from certain 'folders'. Would be good if gmailieer allows that.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/gauteh/gmailieer/issues/40
Disk space isn't that much of a concern (the number of files might be, but I don't know because I haven't yet done a full sync -- on a tangential note, would have been awesome if one could just use a key-value store instead of files to store mails). But, just the process of downloading a decade of emails on your laptop is really slow, and very resource heavy. Also, seems a bit pointless, because when on laptop, you just want to get your inbox to zero, not much else. Also, offlineimap does this out of the box.
I can see why partial sync would be tricky -- IIUC, it would require keeping an external state of all the messages that you had previously downloaded but don't belong to any of the labels in sync config anymore. So, the right question is, does the time it takes to get up and running with Gmailieer and Astroid on a laptop outweigh the complexity of doing partial syncs? I'd say yes.
Manish R Jain writes on september 11, 2017 2:59:
Disk space isn't that much of a concern (might be, but I don't know because I haven't yet done a full sync). But, just the process of downloading a decade of emails on your laptop is really slow, and very resource heavy. Also, seems a bit pointless, because when on laptop, you just want to get your inbox to zero, not much else. Also, offlineimap does this out of the box.
I can see why partial sync would be tricky -- IIUC, it would require keeping an external state of all the messages that you had previously downloaded but don't belong to any of the labels in sync config anymore. So, the right question is, does the time it takes to get up and running with Gmailieer and Astroid on a laptop outweigh the complexity of doing partial syncs? I'd say yes.
When I said partial sync, incremental synchronization might have been more accurate. This is what happens automatically if you have a recent enough state file (have done partial pulls with not too long intervals).
For me, keeping my entire mail archive locally, indexed by notmuch and instantly available (usually searchable faster than through GMail), is a big point to using notmuch. Your inbox will be synced with GMail anyway (if you use gmailieer). Anyway, I certainly respect other use cases, my point is: Once you have everything down, partial (incremental) syncs are really fast, and you will never think about it again. Then it is just a huge advantage to have all your email history instantly searchable.
With notmuch + gmailieer you can even clean up your label-mess in a scriptable fashion.
I'm not entirely opposed to not syncing the full archive, but as you point out it will add a significant layer of complexity:
It is not possible to limit the partial-pull (incremental) to a specific query: this means that we have to manually figure out if a message should be synced if there has been some remote change on it: we then have to exactly match the search behavior of GMail locally (that's going to be inaccurate), or make at least one extra request to the GMail API (that's going to be slow).
However, if there is a change that results in a message that should now be ignored, sync it.
Reveresly: delete it (locally).
We have to do the same thing when pushing changes, and make sure that the message is not supposed to be ignored any more. In which case it sould be deleted locally.
In short; please try and see how it feels after you have done a full sync. I do not think I have time to add this functionality, but I'd support a good implementation of it if you submit a PR.
So, I'm doing the mail sync on desktop. Started it in the morning (Sydney time) and it's going on. Got close to 140K emails, and it's currently at 10K, going at the rate of 2.5s/it (started with it/s, but then switched). So, this would take at least a day to finish (based on current ETA this would be 3.7 days). Fast.com shows my network bandwidth to be at 310 Mbps, so that's pretty fast; the mail download is slow.
I doubt I'll have time to submit a PR -- already running a bunch of OSS projects. Most likely, I'll just work around the problem by just letting it run.
I think if there's a way to speed up the email download, that'd alleviate this issue as well. I like the idea of having the entire mail on laptop, but the download speed makes this hard.
Manish R Jain writes on september 11, 2017 8:31:
So, I'm doing the mail sync on desktop. Started it in the morning (Sydney time) and it's going on. Got close to 140K emails, and it's currently at 10K, going at the rate of 2.5s/it (started with it/s, but then switched). So, this would take at least a day to finish. Fast.com shows my network bandwidth to be at 310 Mbps, so that's pretty fast; the mail download is slow.
That amount of e-mail shouldn't be a problem, but I see that this is getting to become a big problem for initial users. I doubt that there really is a way around this on one API key, as I'm understanding GMail try to limit full downloads.
I doubt I'll have time to submit a PR -- already running a bunch of OSS projects. Most likely, I'll just work around the problem by just letting it run.
Ditto. Plus just got a kid ;) That's why I'm reluctant to add that layer at the moment.
I think if there's a way to speed up the email download, that'd alleviate this issue as well. I like the idea of having the entire mail on laptop, but the download speed makes this hard.
Might be a throttling issue then, you could try to set up your own API
key: instructions in the README. The public one receives 100k-1m
requests / month (especially during new syncs).
You could try out the --limit option just to see how things work, but remember to do a complete, full, sync before you do any pushing. It's really only designed for debugging, so there might be some weird side-effects.
Using my own API key doesn't help. Still reduces the batch req size to 100. Though, I might be doing something wrong, and my API might not be getting picked up -- I don't know. No way to tell from the log output.
The download is hovering around 1.5s/it -- if there's a way to improve this, that'd be awesome.
Manish R Jain writes on september 11, 2017 11:24:
Using my own API key doesn't help. Still reduces the batch req size to 100. Though, I might be doing something wrong, and my API might not be getting picked up -- I don't know. No way to tell from the log output.
Did you re-auth with your new key? refer to the -h output and README for more info.
3rd day of syncing, and only at 70K emails. The batch size reduced to 50 at some point.
content: 53%|█████████████▊ | 74567/139771 [46:59:51<44:06:18, 2.44s/it][
I think the instructions in documentation to get the API aren't clear. Google console asks for multiple options, so I chose the options which looked the best. And got a client_id.json (not client_secret.json) file. I'll try using it again (this time with personal email).
Another thing I noticed is that I'm unable to gmi push
from my work folder (which has already synced), because it somehow picks up emails from personal folder as well. So, I'm unable to use astroid even just for work until personal emails are all synced up.
Also, gmi should indicate which client id it is using, so a user can at least confirm that he's on the right one. Would be even better if it writes a warning when using the generic public client, with limitations on number of requests, etc.
pull: full synchronization (no previous synchronization state)
fetching messages: 139814it [11:37, 200.39it/s] receiving content: 0%|▏ | 117/65192 [04:46<44:13:39, 2.45s/it]reducing batch request size to: 100
So, even after passing in the client_id.json
like this to both pull and to auth, I still get the reducing batch req size to 100 issue.
"You're limited to 100 calls in a single batch request. If you need to make more calls than that, use multiple batch requests." -- https://developers.google.com/gmail/api/guides/batch
I think the recommendation was to use 50 calls per batch request, even, to reduce throttling (the less calls per batch request, the less throttling).
The fixed limit of 100 might be new, and the 50 might be the old "soft" limit.
"Using batching is encouraged, however, larger batch sizes are likely to trigger rate limiting. Sending batches larger than 50 requests is not recommended." -- https://developers.google.com/gmail/api/v1/reference/quota
Hence it would make sense to use batch sizes of 50, if that's the recommended approach. When doing full syncs, I noticed performance to be bad until it went down to 50, but that took some time.
So, 50-100 requests per batch makes sense. But, how many batches are going on concurrently? The page doesn't say anything about the number of concurrent batches allowed. If they support it, then doing so would really improve the throughput.
My mail size so far is only 4GB, but it has taken 3 days of downloading to reach it -- that's way too slow for today's standards, particularly when using Gmail's APIs directly.
Julian Andres Klode writes on september 13, 2017 2:36:
"You're limited to 100 calls in a single batch request. If you need to make more calls than that, use multiple batch requests." -- https://developers.google.com/gmail/api/guides/batch
I think the recommendation was to use 50 calls per batch request, even, to reduce throttling (the less calls per batch request, the less throttling).
We should probably reduce the default limit to 50 then. Thanks for digging this up. I assume they do not encourage concurrent requests, that would defeat the purpose of a limit.
Manish R Jain writes on september 13, 2017 3:50:
So, 50-100 requests per batch makes sense. But, how many batches are going on concurrently? The page doesn't say anything about the number of concurrent batches allowed. If they support it, then doing so would really improve the throughput.
My mail size so far is only 4GB, but it has taken 3 days of downloading to reach it -- that's way too slow for today's standards, particularly when using Gmail's APIs directly.
If you have any practical suggestions on how to speed up things with google, please let me know.
I assume they do not encourage concurrent requests, that would defeat the purpose of a limit.
I'm not so sure. Batching up requests in a single network call makes a lot of sense. Every network call has overhead, and doing batching amortizes that. But, that doesn't mean running multiple batches concurrently is a no-go. In fact, in Dgraph, we encourage our users to batch up mutations, and also run as many batches concurrently as possible.
Google doc is silent about running batches concurrently -- so this might be worth trying out. I think the rate at which Gmailieer is syncing isn't fast at the moment, so if doing multiple batches improves that rate, that'd be a good thing for adoption.
Is the google api client library thread safe?
Manish R Jain writes on september 13, 2017 1:58:
Also, gmi should indicate which client id it is using, so a user can at least confirm that he's on the right one.
If you used gmi auth -c path/to/your/client_secret.json
correctly you
would get a message that the user-provided API id and secret is used. It
is not necessary to use -c
with pull or push unless re-authorization
is required (by google api). Only gmi auth -c ..
will remove the
existing authorization tokens, allowing you to manually re-authorize
with a different client id/secret.
Have a look in remote.py:366.
Note that if your authorization expires for some reason, you need to
re-supply your own API key using gmi auth -c ..
, otherwise you will
be prompted to re-authorize the standard client id/secret.
Manish R Jain writes on september 13, 2017 1:56:
Another thing I noticed is that I'm unable to
gmi push
from my work folder (which has already synced), because it somehow picks up emails from personal folder as well. So, I'm unable to use astroid even just for work until personal emails are all synced up.
You probably have messages that are present in both accounts, this works, but will cause tags/labels to be synced to both accounts. Open a new bug if there is something specific failing.
Okay, finished on the 5th day, with this stack trace.
/data/Mail/personal/mail/cur/15d6e71917791d47:2,S is not an email
receiving metadata: 5%|████████████▋ | 3699/74622 [06:44<16:46, 70.45it/s]remote: could not find remote message: 15ddf5d8ec3b6a21!
receiving metadata: 20%|███████████████████████████████████████████████████▊ | 15174/74622 [11:33<07:49, 126.62it/s]remote: could not find remote message: 15de46a60447183e!
receiving metadata: 21%|█████████████████████████████████████████████████████▊ | 15672/74622 [11:46<16:25, 59.81it/s]remote: could not find remote message: 15de491dbeaef465!
receiving metadata: 24%|█████████████████████████████████████████████████████████████▌ | 17935/74622 [12:41<12:48, 73.76it/s]Traceback (most recent call last):
File "/usr/bin/gmi", line 4, in <module>
__import__('pkg_resources').run_script('gmailieer==0.2', 'gmi')
File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 742, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1510, in run_script
exec(script_code, namespace, namespace)
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/EGG-INFO/scripts/gmi", line 8, in <module>
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/gmailieer.py", line 136, in main
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/gmailieer.py", line 307, in pull
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/gmailieer.py", line 531, in full_pull
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/gmailieer.py", line 562, in get_meta
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/remote.py", line 100, in func_wrap
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/remote.py", line 271, in get_messages
File "/usr/lib/python3.6/site-packages/oauth2client-4.1.2-py3.6.egg/oauth2client/_helpers.py", line 133, in positional_wrapper
File "/usr/lib/python3.6/site-packages/google_api_python_client-1.6.3-py3.6.egg/googleapiclient/http.py", line 1464, in execute
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/remote.py", line 251, in _cb
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/gmailieer.py", line 560, in _got_msg
File "/usr/lib/python3.6/site-packages/gmailieer-0.2-py3.6.egg/lieer/local.py", line 326, in update_tags
lieer.local.RepositoryException: tried to update tags on non-existant file: /data/Mail/personal/mail/cur/14fbf1e049ce7d7c:2,
Re-running pull just so it can finish up the initial sync without error. The fetching messages stage takes around 12 mins each time. I hope that after the initial sync is successfully done, a pull wouldn't take that long.
P.S. Final mail size 7.4GB, ~160K emails.
Re: thread safety of the python client library, a Google search shows this: https://developers.google.com/api-client-library/python/guide/thread_safety
You just need to give a unique http connection per thread -- which is reasonable. If that helps increase the rate of download, it would be a huge win.
Manish R Jain writes on september 15, 2017 1:34:
Re-running pull just so it can finish up the initial sync without error. The fetching messages stage takes around 12 mins each time. I hope that after the initial sync is successfully done, a pull wouldn't take that long.
It will, but consider that you just fetched the full list of e-mails: 160K. Agreeably, not much data, but its a long list.
Manish R Jain writes on september 15, 2017 1:36:
Re: thread safety of the python client library, a Google search shows this: https://developers.google.com/api-client-library/python/guide/thread_safety
You just need to give a unique http connection per thread -- which is reasonable. If that helps increase the rate of download, it would be a huge win.
Yeah, if google accepts multiple threads we should try that!
It will, but consider that you just fetched the full list of e-mails: 160K. Agreeably, not much data, but its a long list.
12 mins of sync to get a new email, ahem.. I'm not sure if that's practical.
Manish R Jain writes on september 15, 2017 9:29:
It will, but consider that you just fetched the full list of e-mails: 160K. Agreeably, not much data, but its a long list.
12 mins of sync to get a new email, ahem.. I'm not sure if that's practical.
This is not partial sync: please refer to previous e-mails.
I just had an achingly slow 6 day initial sync for 3.3 GB of mail. I was using my own API key.
$ gmi sync
push: everything is up-to-date.
pull: full synchronization (no previous synchronization state)
fetching messages: 175158it [04:14, 689.26it/s]
receiving content: 48%|███████████████████████████████▊ | 83230/175158 [46:54:56<70:40:57, 2.77s/it]reducing batch request size to: 25
receiving content: 50%|█████████████████████████████████▎ | 87027/175158 [50:04:27<83:20:56, 3.40s/it]reducing batch request size to: 12
receiving content: 100%|████████████████████████████████████████████████████████████████████| 175158/175158 [143:42:52<00:00, 3.91s/it]
receiving metadata: everything up-to-date.
current historyId: 2903277, current revision: 454462
The follow up sync is predicted to take 16 hours. As has been said here before, this is ridiculous.
Doesn't the official Gmail client use the API? What does it do? I assume it's closed-source, so Wireshark it?
Alex Szczuczko writes on november 27, 2017 15:11:
The follow up sync is predicted to take 16 hours. As has been said here before, this is ridiculous.
What do you mean?
Doesn't the official Gmail client use the API? What does it do? I assume it's closed-source, so Wireshark it?
As far as I can tell it doesn't download the e-mails, it just syncs the metedata and downloads the messages on-demand.
If the IMAP protocol is faster at downloading messages it could be used together with the X-GM-MSGID extension to save the messages in place of the GMail API with gmailieer: https://developers.google.com/gmail/imap/imap-extensions#access_to_the_gmail_unique_message_id_x-gm-msgid.
By ridiculous I mean that it's impractically long. Also, the amount of data I needed could be transferred in under an hour over a not-terrible internet connection (>10 mbps). Annoying that Google doesn't just offer a "give me everything" download option.
Hybrid IMAP could be an solution, although authentication for that might be an issue. As far as I know, it's user+pass only, no oauth. Right now gmailieer could enable IMAP access, but not actually use it?
I don't understand why a follow up sync would take 16 hours for you. That seems weird. It should only take 1-2 seconds if nothing changed, as no messages are downloaded, only the list of changes is requested.
A full sync (with the emails available locally, so they are not re-downloaded, but all metadata is queried) on my 50000 emails takes like 20 minutes or so, you only have 3 times as many.
The first follow up sync had to accommodate 6 days of new mail, as the initial sync took that long to run. Now that the syncs aren't taking so long, the delta is smaller and they run faster. It's a feedback loop.
Alex Szczuczko writes on november 28, 2017 18:20:
By ridiculous I mean that it's impractically long. Also, the amount of data I needed could be transferred in under an hour over a not-terrible internet connection (>10 mbps). Annoying that Google doesn't just offer a "give me everything" download option.
Yeah - I got that part, was thinking about the follow-up sync.
Hybrid IMAP could be an solution, although authentication for that might be an issue. As far as I know, it's user+pass only, no oauth. Right now gmailieer could enable IMAP access, but not actually use it?
Exactly, it would be a bit of a hassle for users. At the moment I think a lot of users already do have IMAP setup though.
Actaully.. on closer look there is XOAUTH2, perhaps that could work: https://developers.google.com/gmail/imap/imap-smtp
You could give #55 a shot, it uses IMAP to download messages. I got between 2-3 messages a second with this. Maybe someone with better IMAP knowledge can optimize this.
The output indicates that he was reaching 3-4 messages / second via http, so 2-3 does not seem like an improvement. Maybe you should try bumping the batch sizes up again, though Google says you'd get throttled more.
I mean, it only took me a few hours (an hour? I don't remember exactly) to sync my 48465 emails, about 986 MB large. 3 times that should not take like 30 times as long.
It starts at about 700-800 it/s with "fetching messages" (which I think fetches the ids?). Then it starts fetching content at 30 messages / second. It expects a full initial sync to take about 30 minutes.
Julian Andres Klode writes on november 28, 2017 22:54:
The output indicates that he was reaching 3-4 messages / second via http, so 2-3 does not seem like an improvement. Maybe you should try bumping the batch sizes up again, though Google says you'd get throttled more.
I mean, it only took me a few hours (an hour? I don't remember exactly) to sync my 48465 emails, about 986 MB large. 3 times that should not take like 30 times as long.
It starts at about 700-800 it/s with "fetching messages" (which I think fetches the ids?). Then it starts fetching content at 30 messages / second. It expects a full initial sync to take about 30 minutes.
It's the 'receiving content' part that is slow (presumably?). From the output it seems that the batch sizes got throttled already, so no use in increasing them I think. The size now is the recommended one.
Gmailieer will increase the batch size back to the normal if there has been several successful requests at the current batch size.
There are also limits at how many times you can download your e-mail box (before it gets severely throttled), this limit I believe is connected to your account: so if you have experimented with downloading using IMAP then gmailieer a few times things are going to get very slow!
Maybe he had enough messages to get throttled more than you or me as well. I think that I needed 4-8 hours to sync about 80k of emails (I don't remember any more).
In my case it was quick. My 160K emails of 12GB were synced in 2hrs. In comparison offlineimap took 14hs and mbsync 10hrs.
I have started a table with initial synchronization time in https://github.com/gauteh/gmailieer/wiki. If any of you have other experiences, please add them there! @fikovnik I have added your report.
In an effort to determine what causes severe throtteling I have added a few fields which should be filled out. If you suspect that this is caused by other variables please let me know (though we should probably keep them at a minimum).
It seems that recent tests where no big syncs have been performed lately this is fairly resolved. Closing for now.
I'm currently on day 5 of my initial sync of 150K+ emails ~10GB, 256 notmuch tags. Batch request was reduced to 25, to 12, to 4 then to 1 where it stayed. I have a GSuite account which doesn't have API key access. Unless sync time improves drastically after the initial sync, I'll have to ditch gmailieer which would be very disappointing as the two-way tagging would have been perfect for me.
Matthew Lear writes on april 13, 2018 11:59:
I'm currently on day 5 of my initial sync of 150K+ emails ~10GB, 256 notmuch tags. Batch request was reduced to 25, to 12, to 4 then to 1 where it stayed. I have a GSuite account which doesn't have API key access. Unless sync time improves drastically after the initial sync, I'll have to ditch gmailieer which would be very disappointing as the two-way tagging would have been perfect for me.
Have you checked the points in: https://github.com/gauteh/gmailieer/wiki ?
If you have synced your full account lately (either with gmaileer or other means, e.g. IMAP) your account is likely throttled, and you might get better results by stopping it for a day or two. Google does not provide good guidelines for this which I am aware of.
You can use your own API key with GSuite as well (I do). If you are not allowed to generate the API key with your specific GSuite account, then you can generate it with a regular google account.
The partial/incremental sync done after the full sync is usually done in a few seconds if you perform it frequently.
In your experience, how often is a periodic sync required in order to keep the sync duration short? I appreciate that this may depend on a few variables...
On Fri, 13 Apr 2018, 12:44 Gaute Hope, notifications@github.com wrote:
Matthew Lear writes on april 13, 2018 11:59:
I'm currently on day 5 of my initial sync of 150K+ emails ~10GB, 256 notmuch tags. Batch request was reduced to 25, to 12, to 4 then to 1 where it stayed. I have a GSuite account which doesn't have API key access. Unless sync time improves drastically after the initial sync, I'll have to ditch gmailieer which would be very disappointing as the two-way tagging would have been perfect for me.
Have you checked the points in: https://github.com/gauteh/gmailieer/wiki ?
If you have synced your full account lately (either with gmaileer or other means, e.g. IMAP) your account is likely throttled, and you might get better results by stopping it for a day or two. Google does not provide good guidelines for this which I am aware of.
You can use your own API key with GSuite as well (I do). If you are not allowed to generate the API key with your specific GSuite account, then you can generate it with a regular google account.
The partial/incremental sync done after the full sync is usually done in a few seconds if you perform it frequently.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381110458, or mute the thread https://github.com/notifications/unsubscribe-auth/AJnmgc75mBBkmtRw7G6Ky5iwPos-3xzLks5toI-dgaJpZM4PR5so .
During the day I sync every 2-3 minutes. Most of the time there are no changes, which takes about 0.5 seconds since only one request needs to be made to GMail.
To keep the refresh token and access tokens I think you need to sync more than every two weeks.
The incremental history is only stored for a limited number of events at
GMail, if this is expired (say you don't sync in three-four months),
then you have to do a full sync. The full sync will not need to re-fetch
the content, only the first part of the sync - so equivalent to the
first step when you ran gmi. You can force this with gmi pull -f
.
So; if you sync often enough for the history to not expire it should be fast. I've never exprienced this, but I've never gone longer than maybe two weeks. One user reported that he had to do a full sync after a few months of inactivity. When fetching the actual messages you get much faster download times for each message than what you do now as well. And as mentioned, you do not need to re-download anything you allready have. If you get a lot of mail or changes to your labels, then you have to sync more often. I have a similar order of total mail as you, so it is probably similar.
Matthew Lear writes on april 14, 2018 0:28:
In your experience, how often is a periodic sync required in order to keep the sync duration short? I appreciate that this may depend on a few variables...
On Fri, 13 Apr 2018, 12:44 Gaute Hope, notifications@github.com wrote:
Matthew Lear writes on april 13, 2018 11:59:
I'm currently on day 5 of my initial sync of 150K+ emails ~10GB, 256 notmuch tags. Batch request was reduced to 25, to 12, to 4 then to 1 where it stayed. I have a GSuite account which doesn't have API key access. Unless sync time improves drastically after the initial sync, I'll have to ditch gmailieer which would be very disappointing as the two-way tagging would have been perfect for me.
Have you checked the points in: https://github.com/gauteh/gmailieer/wiki ?
If you have synced your full account lately (either with gmaileer or other means, e.g. IMAP) your account is likely throttled, and you might get better results by stopping it for a day or two. Google does not provide good guidelines for this which I am aware of.
You can use your own API key with GSuite as well (I do). If you are not allowed to generate the API key with your specific GSuite account, then you can generate it with a regular google account.
The partial/incremental sync done after the full sync is usually done in a few seconds if you perform it frequently.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381110458, or mute the thread https://github.com/notifications/unsubscribe-auth/AJnmgc75mBBkmtRw7G6Ky5iwPos-3xzLks5toI-dgaJpZM4PR5so .
-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/gauteh/gmailieer/issues/40#issuecomment-381275571
I think there is definitely something not right... At least, it seems so for me. My full sync finally completed at 0333 this morning. My simple shell script to periodically call gmi slept for 2 mins then did another gmi sync. 4 hours later, it's still going.. I hadn't run notmuch new between syncs either so all that had changed was on the server side. Can we add in some sort of verbose status reporting to try and see where / why so much time is getting spent (and reason[s] for)?
On Sat, 14 Apr 2018, 06:33 Gaute Hope, notifications@github.com wrote:
During the day I sync every 2-3 minutes. Most of the time there are no changes, which takes about 0.5 seconds since only one request needs to be made to GMail.
To keep the refresh token and access tokens I think you need to sync more than every two weeks.
The incremental history is only stored for a limited number of events at GMail, if this is expired (say you don't sync in three-four months), then you have to do a full sync. The full sync will not need to re-fetch the content, only the first part of the sync - so equivalent to the first step when you ran gmi. You can force this with
gmi pull -f
.So; if you sync often enough for the history to not expire it should be fast. I've never exprienced this, but I've never gone longer than maybe two weeks. One user reported that he had to do a full sync after a few months of inactivity. When fetching the actual messages you get much faster download times for each message than what you do now as well. And as mentioned, you do not need to re-download anything you allready have. If you get a lot of mail or changes to your labels, then you have to sync more often. I have a similar order of total mail as you, so it is probably similar.
Matthew Lear writes on april 14, 2018 0:28:
In your experience, how often is a periodic sync required in order to keep the sync duration short? I appreciate that this may depend on a few variables...
On Fri, 13 Apr 2018, 12:44 Gaute Hope, notifications@github.com wrote:
Matthew Lear writes on april 13, 2018 11:59:
I'm currently on day 5 of my initial sync of 150K+ emails ~10GB, 256 notmuch tags. Batch request was reduced to 25, to 12, to 4 then to 1 where it stayed. I have a GSuite account which doesn't have API key access. Unless sync time improves drastically after the initial sync, I'll have to ditch gmailieer which would be very disappointing as the two-way tagging would have been perfect for me.
Have you checked the points in: https://github.com/gauteh/gmailieer/wiki ?
If you have synced your full account lately (either with gmaileer or other means, e.g. IMAP) your account is likely throttled, and you might get better results by stopping it for a day or two. Google does not provide good guidelines for this which I am aware of.
You can use your own API key with GSuite as well (I do). If you are not allowed to generate the API key with your specific GSuite account, then you can generate it with a regular google account.
The partial/incremental sync done after the full sync is usually done in a few seconds if you perform it frequently.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381110458, or mute the thread < https://github.com/notifications/unsubscribe-auth/AJnmgc75mBBkmtRw7G6Ky5iwPos-3xzLks5toI-dgaJpZM4PR5so
.
-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/gauteh/gmailieer/issues/40#issuecomment-381275571
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381305174, or mute the thread https://github.com/notifications/unsubscribe-auth/AJnmgUj0-PH_MfWXCi9JnEVAmhNEsqd0ks5toYowgaJpZM4PR5so .
Notmuch new doesnt change anything. I think there is a —debug flag. As well as a dry-run flag which should give you an idea of what is happening. If there is a lot of actions scheduled there has usually been some batch change of a lot of tags on either side.
søn. 15. apr. 2018 kl. 09:44 skrev Matthew Lear notifications@github.com:
I think there is definitely something not right... At least, it seems so for me. My full sync finally completed at 0333 this morning. My simple shell script to periodically call gmi slept for 2 mins then did another gmi sync. 4 hours later, it's still going.. I hadn't run notmuch new between syncs either so all that had changed was on the server side. Can we add in some sort of verbose status reporting to try and see where / why so much time is getting spent (and reason[s] for)?
On Sat, 14 Apr 2018, 06:33 Gaute Hope, notifications@github.com wrote:
During the day I sync every 2-3 minutes. Most of the time there are no changes, which takes about 0.5 seconds since only one request needs to be made to GMail.
To keep the refresh token and access tokens I think you need to sync more than every two weeks.
The incremental history is only stored for a limited number of events at GMail, if this is expired (say you don't sync in three-four months), then you have to do a full sync. The full sync will not need to re-fetch the content, only the first part of the sync - so equivalent to the first step when you ran gmi. You can force this with
gmi pull -f
.So; if you sync often enough for the history to not expire it should be fast. I've never exprienced this, but I've never gone longer than maybe two weeks. One user reported that he had to do a full sync after a few months of inactivity. When fetching the actual messages you get much faster download times for each message than what you do now as well. And as mentioned, you do not need to re-download anything you allready have. If you get a lot of mail or changes to your labels, then you have to sync more often. I have a similar order of total mail as you, so it is probably similar.
Matthew Lear writes on april 14, 2018 0:28:
In your experience, how often is a periodic sync required in order to keep the sync duration short? I appreciate that this may depend on a few variables...
On Fri, 13 Apr 2018, 12:44 Gaute Hope, notifications@github.com wrote:
Matthew Lear writes on april 13, 2018 11:59:
I'm currently on day 5 of my initial sync of 150K+ emails ~10GB, 256 notmuch tags. Batch request was reduced to 25, to 12, to 4 then to 1 where it stayed. I have a GSuite account which doesn't have API key access. Unless sync time improves drastically after the initial sync, I'll have to ditch gmailieer which would be very disappointing as the two-way tagging would have been perfect for me.
Have you checked the points in: https://github.com/gauteh/gmailieer/wiki ?
If you have synced your full account lately (either with gmaileer or other means, e.g. IMAP) your account is likely throttled, and you might get better results by stopping it for a day or two. Google does not provide good guidelines for this which I am aware of.
You can use your own API key with GSuite as well (I do). If you are not allowed to generate the API key with your specific GSuite account, then you can generate it with a regular google account.
The partial/incremental sync done after the full sync is usually done in a few seconds if you perform it frequently.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/gauteh/gmailieer/issues/40#issuecomment-381110458 , or mute the thread <
.
-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/gauteh/gmailieer/issues/40#issuecomment-381275571
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381305174, or mute the thread < https://github.com/notifications/unsubscribe-auth/AJnmgUj0-PH_MfWXCi9JnEVAmhNEsqd0ks5toYowgaJpZM4PR5so
.
—
You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381387100, or mute the thread https://github.com/notifications/unsubscribe-auth/AADd-8rKxdbmbRdtS8IP87_Bgvs4GCqAks5tovpTgaJpZM4PR5so .
Could you paste the output from the ongoing sync?
Matthew Lear writes on april 15, 2018 9:44:
I think there is definitely something not right... At least, it seems so for me. My full sync finally completed at 0333 this morning. My simple shell script to periodically call gmi slept for 2 mins then did another gmi sync. 4 hours later, it's still going.. I hadn't run notmuch new between syncs either so all that had changed was on the server side. Can we add in some sort of verbose status reporting to try and see where / why so much time is getting spent (and reason[s] for)?
It finished last night. Finally! I seemed to get a very low it/s rate when pushing. Batch usually gets reduced to 1. I updated my notmuch tags to tag about 10k+ mails and ran another sync. Took about 12 mins... For sure I'll monitor the behaviour and raise another tricket if I have issues. Seems inappropriate to post here since my initial sync is complete now.
On Mon, 16 Apr 2018, 08:14 Gaute Hope, notifications@github.com wrote:
Could you paste the output from the ongoing sync?
Matthew Lear writes on april 15, 2018 9:44:
I think there is definitely something not right... At least, it seems so for me. My full sync finally completed at 0333 this morning. My simple shell script to periodically call gmi slept for 2 mins then did another gmi sync. 4 hours later, it's still going.. I hadn't run notmuch new between syncs either so all that had changed was on the server side. Can we add in some sort of verbose status reporting to try and see where / why so much time is getting spent (and reason[s] for)?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381501919, or mute the thread https://github.com/notifications/unsubscribe-auth/AJnmgVYMFAaU-5AnM1sXD068_eNq7dE3ks5tpES7gaJpZM4PR5so .
Good stuff. You have fast incremental syncs now? 10k+ changes in 12 mins seems pretty decent.
Matthew Lear writes on april 16, 2018 10:15:
It finished last night. Finally! I seemed to get a very low it/s rate when pushing. Batch usually gets reduced to 1. I updated my notmuch tags to tag about 10k+ mails and ran another sync. Took about 12 mins... For sure I'll monitor the behaviour and raise another tricket if I have issues. Seems inappropriate to post here since my initial sync is complete now.
On Mon, 16 Apr 2018, 08:14 Gaute Hope, notifications@github.com wrote:
Could you paste the output from the ongoing sync?
Matthew Lear writes on april 15, 2018 9:44:
I think there is definitely something not right... At least, it seems so for me. My full sync finally completed at 0333 this morning. My simple shell script to periodically call gmi slept for 2 mins then did another gmi sync. 4 hours later, it's still going.. I hadn't run notmuch new between syncs either so all that had changed was on the server side. Can we add in some sort of verbose status reporting to try and see where / why so much time is getting spent (and reason[s] for)?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gauteh/gmailieer/issues/40#issuecomment-381501919, or mute the thread https://github.com/notifications/unsubscribe-auth/AJnmgVYMFAaU-5AnM1sXD068_eNq7dE3ks5tpES7gaJpZM4PR5so .
-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/gauteh/gmailieer/issues/40#issuecomment-381517032
My personal email goes back over 10 years, and I'd rather not have all of that downloaded on my laptop (even setting the max messages per label in Gmail doesn't help with that, because I have gained way too many labels over the years). offlineimap allows you to sync mails from certain 'folders'. Would be good if gmailieer allows that.