gco / rietveld

Automatically exported from code.google.com/p/rietveld
Apache License 2.0
0 stars 0 forks source link

Impossible to delete an issue with >500 data entries due to http 500 #162

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
http://codereview.chromium.org/259056

When apavlov@ tries to delete this review, he gets 500 and the attached error 
is logged. PatchSet 1 and 3 are small but #2 is relatively 
large. By looking at the code in views.py line 1683, I guess we should delete 
in stage when the number of items > 500. That's sad 
though. AppEngine should hide that fact from us, at least this limit doesn't 
seem to be documented anywhere.

Exact views.py reference:
http://code.google.com/p/rietveld/source/browse/branches/chromium/codereview/vie
ws.py?r=475#gr_svn475_1683

SELECT * FROM Issue WHERE __key__ = KEY('Issue', 259056) retrieves the 1 item, 
the issue.
SELECT * FROM PatchSet WHERE ANCESTOR IS KEY('Issue', 259056) retrieves 3 items
SELECT * FROM Patch WHERE ANCESTOR IS KEY('Issue', 259056) retrieves 240 items.
SELECT * FROM Comment WHERE ANCESTOR IS KEY('Issue', 259056) retrieves 14 items.
SELECT * FROM Message WHERE ANCESTOR IS KEY('Issue', 259056) retrieves 4 items.
SELECT * FROM Content WHERE ANCESTOR IS KEY('Issue', 259056) retrieves 247 
items and the results are full of null empty.
Total: 509.

The workaround I propose to not cause further timeouts is to use a 
google.appengine.api.labs.taskqueue.Task to defer work when (len(tbd) 
> 500) and send cascaded tasks of 500 items blocks.

Reference: http://code.google.com/appengine/docs/python/taskqueue/tasks.html

Still in the meantime, just looping in blocks of 500 would help a bit.

Sample error log:

10-16 09:24AM 01.551 /259056/delete 500 2218ms 76142cpu_ms 74723api_cpu_ms 0kb 
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) 
AppleWebKit/532.1 (KHTML, like Gecko) Chrome/4.0.213.1 
Safari/532.1,gzip(gfe),gzip(gfe)
1.2.3.4 - apavlov [16/Oct/2009:09:24:03 -0700] "POST /259056/delete HTTP/1.1" 
500 143 "http://codereview.chromium.org/259056/edit" 
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.1 (KHTML, like 
Gecko) Chrome/4.0.213.1 
Safari/532.1,gzip(gfe),gzip(gfe)" "codereview.chromium.org"
E10-16 09:24AM 03.596
Exception in request: BadRequestError: cannot delete more than 500 entities in 
a single call
Traceback (most recent call last):
  File "/base/python_lib/versions/third_party/django-1.0/django/core/handlers/base.py", line 86, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/base/data/home/apps/chromiumcodereview/3.336500580264792347/codereview/views.py", line 511, in post_wrapper
    return func(request, *args, **kwds)
  File "/base/data/home/apps/chromiumcodereview/3.336500580264792347/codereview/views.py", line 523, in login_wrapper
    return func(request, *args, **kwds)
  File "/base/data/home/apps/chromiumcodereview/3.336500580264792347/codereview/views.py", line 558, in issue_wrapper
    return func(request, *args, **kwds)
  File "/base/data/home/apps/chromiumcodereview/3.336500580264792347/codereview/views.py", line 590, in issue_owner_wrapper
    return func(request, *args, **kwds)
  File "/base/data/home/apps/chromiumcodereview/3.336500580264792347/codereview/views.py", line 1683, in delete
    db.delete(tbd)
  File "/base/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 1241, in delete
    datastore.Delete(keys)
  File "/base/python_lib/versions/1/google/appengine/api/datastore.py", line 281, in Delete
    raise _ToDatastoreError(err)
BadRequestError: cannot delete more than 500 entities in a single call

Original issue reported on code.google.com by maruel@chromium.org on 16 Oct 2009 at 5:54

GoogleCodeExporter commented 9 years ago
s/null empty/empty entries/

Original comment by maruel@chromium.org on 16 Oct 2009 at 5:55

GoogleCodeExporter commented 9 years ago

Original comment by john.abd...@gmail.com on 16 Oct 2009 at 5:57

GoogleCodeExporter commented 9 years ago
Wow. I worry that chunking would cause timeout errors, so I think we should 
bite the 
bullet and use the task queue.  Can someone come up with a patch?

Original comment by gvanrossum@gmail.com on 16 Oct 2009 at 6:00

GoogleCodeExporter commented 9 years ago
Don't tasks have the same timeout as normal requests?  In which case, we'd have 
to loop 
through all the datastore items to find out how many there are, in order to 
know if we 
need tasks, or even multiple ones, right?  If so, I thought that iterating 
itself is a 
slow operation, so we can still have timeouts.

Original comment by john.abd...@gmail.com on 16 Oct 2009 at 6:07

GoogleCodeExporter commented 9 years ago
You can throw a task only when needed by catching BadRequestError and Timeout.

Original comment by maruel@chromium.org on 16 Oct 2009 at 6:11

GoogleCodeExporter commented 9 years ago
Tasks are retried if they fail. So the task could repeatedly try to delete the 
first 
500 (or fewer -- maybe the first 100) entries of the query results until the 
tbd 
array is empty or until an exception happens. I'm thinking of something like

while True:
  tbd = <queries go here>
  db.delete(tbd[:100])
  del tbd[100:]
  if not tbd:
    break

Tasks are cheap, so there's no reason to only create a task if the tbd list is 
big.  
With some cleverness we could delete the Issue in the main event (so it appears 
to be 
deleted from the user's POV) and then create a task that deletes all its 
descendants.  
These days you can even do a Kind-less keys-only query for all descendants from 
a 
given Key.

Original comment by gvanrossum@gmail.com on 16 Oct 2009 at 6:17

GoogleCodeExporter commented 9 years ago
I am facing this issue now since my application provides users the ability to 
delete thousands of records in one click of a button. While tasks are cheap 
now, they still add up to CPU quota limits. Whats the best way to deal with 
this?

Original comment by jaiswal....@gmail.com on 14 Jun 2010 at 6:16

GoogleCodeExporter commented 9 years ago

Original comment by albrecht.andi on 6 Apr 2012 at 6:31