Fails on horribly huge inboxes

gregose commented 10 years ago

My inbox has 5k+ messages. Hit a rate limit with the script:

Message details
Service invoked too many times in a short time: gmail rateMax. Try Utilities.sleep(1000) between calls. (line 311, file "labeler")

mhagger commented 10 years ago

I'm getting the same error (many times, rolled up in an email from Google most or all days). Except that my inbox only has 58 items in it. My "All Mail" folder has 2700, so even that isn't all that enormous. I set the script to run every 10 minutes. I just decreased that to every 15 minutes to see if it makes a difference.

I am using something very close to version dbb4a2d20b7f045af1e2a388076ae3aece5e069c.

btoews commented 10 years ago

@mhagger does it give a line number for your error message? I'm curious which API it is hitting the limit for.

mhagger commented 10 years ago

@mastahyeti: here's the start of the email:

screenshot from 2014-07-10 07 06 26

Line 319 in my version of the script (my customization probably changed the line numbers) is

parts = this._message.getRawContent().split("\r\n\r\n", 2);

technicalpickles commented 10 years ago

Seeing this as well after being offline a few days:

summary of failures for google apps script_ technicalpickles octogas - josh nichols github com - github mail

mislav commented 9 years ago

For me it fails on an inbox that has less than 200 emails. See Google's quota limits here: https://script.google.com/dashboard

It claims "50000 Gmail operations / day" for Apps for Business accounts but doesn't go into detail of what this means. It's possible that we all share the same daily quota. Or, it's possible that there is a finer grained quota that would be alleviated by actually putting sleep() calls in the script?

matthewmccullough commented 9 years ago

I'm getting the same/similar error every day now. My inbox typically has 5-20 emails in it (I process email into the archive hourly on weekdays). What can I do to adapt?

:heavy_check_mark: Small number of items in inbox
:heavy_check_mark: Script set to run only every 15 minutes (also tried 30m)

mislav commented 9 years ago

I wrote my own simpler version of OctoGAS that optimizes a lot of queries over this one, but even that wasn't enough to get rid of query limit failures. I'll experiment with one more level of optimization and post my findings here.

btoews commented 9 years ago

@matthewmccullough Looks like you're actually running into trouble with the muter script and not the labeler script. Google had changed some behavior which broke this script, but I fixed that in https://github.com/mastahyeti/OctoGAS/pull/11. Can you try copy-pasting the current version of https://github.com/mastahyeti/OctoGAS/blob/master/muter.gs into your copy of the script?

You might need to run the script manually once (they seem to rate limit manual script runs differently) to clear out your backlog of muted messages. Ping me if you need help with any of this.

btoews commented 9 years ago

As for the rate limit problem with the labeler script, @josh had a good idea that I was meaning to follow up on. If the user adds a Gmail filter to add a octogas-queue label to each new message from GitHub, OctoGAS can search for that label instead of searching for all unarchived messages from GitHub. It can then do it's labeling and remove the octogas-queue label. This would cut down on the number of messages that are processed for each run.

matthewmccullough commented 9 years ago

try copy-pasting the current version

:+1: Whoops. Can do!

technicalpickles commented 9 years ago

While this is being figured out, I ended up creating a filter to just ignore the email notifications about hitting rate limits:

mislav commented 9 years ago

In my script, I cache the timestamp when the script last run and then grab only threads that were updated since that time. Avoids iterating over threads that have already been processed:

  var query = 'in:inbox AND ( from:"notifications@github.com" OR from:"notifications@support.github.com" OR from:"noreply@github.com" )'
    , lastRunAt = cache.getLastRun()
    , newLastRun = new Date()

  if (lastRunAt) {
    query += " after:" + lastRunAt
  }
  cache.recordLastRun(newLastRun)

However, I think individual message.getPlainBody() calls are what is hitting the rate limits when there are long threads in progress in your inbox. Whenever a thread gets bumped, the GAS script needs to process it again, and does so from the beginning.

My plan was to experiment with saving the ID of the last message in the thread that was already processed, then start processing only new messages after that one. That will save on a lot of unnecessary getPlainBody() calls.

technicalpickles commented 9 years ago

As for the rate limit problem with the labeler script, @josh had a good idea that I was meaning to follow up on. If the user adds a Gmail filter to add a octogas-queue label to each new message from GitHub, OctoGAS can search for that label instead of searching for all unarchived messages from GitHub. It can then do it's labeling and remove the octogas-queue label. This would cut down on the number of messages that are processed for each run.

Mentioned this to @ross earlier today, and he said he had been doing something similar in his own copy of the script. Please to share? :grin:

ross commented 9 years ago

Mentioned this to @ross earlier today, and he said he had been doing something similar in his own copy of the script. Please to share?

i just pr'd the customizations i made. they're directly in the labeler.gs file since i didn't know anything about coffeescript and i was in the middle of onboarding when i did it.

the nice this about this route is that it only takes a single gmail filter to move all notifications in to Github/Pending and then once it processes them it moves them out so it doesn't grow over time.

i actually did it this way b/c my android notifications were happening almost immediately after receiving the messages way before OctoGAS has a chance to run so my phone was constantly showing dozens of notifications.

mislav commented 9 years ago

I've upgraded my "simpler OctoGAS" script to cache last read message index for all processed threads and, when new replies arrive, process only new messages in a thread rather than starting from the beginning of the thread.

    log("fetching messages for %d threads", todoThreads.length)
    forEach(GmailApp.getMessagesForThreads(todoThreads), function(messages, i){
      var message
        , thread = todoThreads[i]
        , i = cache.getStartingMessageIndex(thread)

      log("fetching body for %d messages starting from index %d", messages.length - i, i)

      for (; i < messages.length; i++) {
        message = messages[i]
        // ...

josh commented 9 years ago

Here's one of my labeling scripts.

function processQueue() { 
  var githubReasonLabels = {
    "assign": GmailApp.getUserLabelByName("GitHub/Assign"),
    "author": GmailApp.getUserLabelByName("GitHub/Author"),
    "comment": GmailApp.getUserLabelByName("GitHub/Comment"),
    "mention": GmailApp.getUserLabelByName("GitHub/Mention"),
    "team_mention": GmailApp.getUserLabelByName("GitHub/Team Mention"),
    "manual": GmailApp.getUserLabelByName("GitHub/Manual")
  };

  function processThread(thread, messages) {
    for (var i = 0; i < messages.length; i++) {
      if (!messages[i].isUnread()) continue;

      var rawContents = messages[i].getRawContent();
      var match = rawContents.match(/^X-GitHub-Reason: ((.|\r\n\s)+)\r\n/m);
      if (match) {
        var reasonLabel = githubReasonLabels[match[1]];
        if (reasonLabel) reasonLabel.addToThread(thread);
      }
    }
  }

  var label = GmailApp.getUserLabelByName("Queue");
  var threads = label.getThreads();
  var messages = GmailApp.getMessagesForThreads(threads); 

  for (var i = 0; i < threads.length; i++) {
    Logger.log("Process Thread[" + i + "]");
    processThread(threads[i], messages[i]);
    threads[i].removeLabel(label);
  }
}

ross commented 9 years ago

In my script, I cache the timestamp when the script last run

seems like anything relying on the cache is eventually going to run in to trouble when the cache is wiped/key is evicted.

to address that it would seem like it would have to process some number of things each pass and stop (recording the cache key) and then pick up at that point the next time.

another option might be to label the last processed item and use that non-ephemeral marker.

mislav commented 9 years ago

seems like anything relying on the cache is eventually going to run in to trouble when the cache is wiped/key is evicted.

I set my cache TTL for 2 hours and renew it when I run the script multiple times within that period. In my experience I didn't see the caches get wiped arbitrarily. But I agree, that's a downside.

@josh Pretty cool trick with checking for isUnread :+1:

btoews commented 9 years ago

Sorry for not responding here sooner. Looking at the labler script again, we do have caching of which threads have already been processed and threads should only be processed once:

class Thread
  # Queue all threads to have the appropriate labels applied given our reason
  # for receiving them.
  #
  # Returns nothing.
  @labelAllForReason: ->
    @all[id].labelForReason() for id in @ids when !@all[id].alreadyDone()

  # Load a list of Thread ids that have already been labled. Because the ids
  # are based on the messages in the thread, new messages in a thread will
  # trigger relabeling.
  #
  # Returns nothing.
  @loadDoneFromCache: ->
    cached = CACHE.get @doneKey
    @done = JSON.parse(cached) if cached

  # Save the list of ids that we have already labeled.
  #
  # Returns nothing.
  @dumpDoneToCache: ->
    CACHE.put @doneKey, JSON.stringify(@done)

  # Has this thread already been labeled?
  #
  # Returns a bool.
  alreadyDone: ->
    Thread.done.indexOf(@id) >= 0

...

Label.loadPersisted()
Thread.loadFromSearch QUERY
Thread.loadDoneFromCache()
Message.loadReasonsFromCache()
try
  Thread.labelAllForReason()
  Thread.archiveAll() if SHOULD_ARCHIVE
catch error
  Logger.log error
finally
  try
    Label.applyAll()
  catch
    Logger.log error
  finally
    Thread.dumpDoneToCache()
    Message.dumpReasonsToCache()

Assuming that the error handling is correct, this should be able to process a large inbox over the course of many runs, even if it hits rate limit issues. I don't have a large inbox to test this in, so maybe it isn't working. They've updated the cache API a bit, so I made a few changes in https://github.com/mastahyeti/OctoGAS/commit/dcd108602a01a1b9d4011857fd8879a9ee34a232 and https://github.com/mastahyeti/OctoGAS/commit/edd8ac806cdfd37a5fb2fecfd69d7f8c437770ee.

ross commented 8 years ago

I have a setup where I wrote a general rule that all incoming github notifications go in to a GitHub/Pending label and skip the inbox and the modified the script to only process things in that label and to remove the label when it was done. That has seemed to work even with 100's of messages when I've been gone for a while.

btoews / OctoGAS

Fails on horribly huge inboxes #4