brian-zhao / google-app-engine-django

Automatically exported from code.google.com/p/google-app-engine-django
Apache License 2.0
0 stars 0 forks source link

Datastore timeout / random exception on first request can cause double imports and raise errors like Resolver404 #169

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Run your app until a random exception occurs on the first request to a
new instance of your app
2. Continue to make requests.

What is the expected output? What do you see instead?

It's very likely that you will see an exception that doesn't make any
sense, like a Resolver404 within a try/except block that would otherwise
get caught.

What version of the product are you using? On what operating system?

r201
Django 1.1.1

Please provide any additional information below.

The problem is that Google App Engine Django will install the Django module
twice into memory when an exception like this occurs.  There is a patch
attached here that ensures Django is only imported once.

Here is a more elaborate explanation:

When an exception is raised on the first request of an application on
Google App Engine, the main.py script is aborted and is not loaded into
memory.  On the second request, main.py is reloaded even though previous
imported modules are still held in memory.  If the exception was random,
like a Datastore timeout, then it might go away on the second request but
at that point your app is left in volatile state because Django was removed
from sys.modules and re-imported.  Due to the way referencing works in
Python, it is possible that there might be two instances of the Django
module in memory.

For an example of a real traceback see this ticket in a live app:
http://code.google.com/p/google-app-engine-django/issues/detail?id=155 
Each exception is a little different but here another traceback I saw point
to code in Django like this:

try:
    urlresolvers.resolve(path)
    return True
except urlresolvers.Resolver404:
    return False

in this block of code the exception Resolver404 was getting raised even
though it appeared to be getting caught.  This is because the Django module
had been reloaded and two classes named Resolver404 were in the running
program.

In addition to the patch for the fix, I have attached
gae-django-reproduce-double-bug.zip is a complete app that reproduces the
bug by simulating a chain of similar exceptions.  if you run this app, you
will see :

1st request:
log message that App Engine Django is booting up
Resolver404 exception that is properly caught and ignored
a ValueError to simulate a random exception

2nd request:
log message that App Engine Django is booting up
a Resolver404 exception that is mysteriously not caught

If App Engine Django was not booted up the second time then the error would
be fixed.

The attached patch might not be the most elegant approach but this should
give you some ideas for how to fix the problem.  My app at
http://code.google.com/p/chirpradio/ would see a random Resolver404 error
at least once a day and after running with the patch, the error has not
appeared in a full week.

See also this bug report in Google App Engine which has more details about
the problem: http://code.google.com/p/googleappengine/issues/detail?id=1409

Original issue reported on code.google.com by kumar.mcmillan on 17 May 2010 at 3:01

Attachments:

GoogleCodeExporter commented 8 years ago
Possibly issue 1409 is also relevant -- the problem there is caused by 
app-engine-
patch which attempts to delete the imported django modules. This is not safe.

Original comment by guido@google.com on 4 Jun 2010 at 7:07

GoogleCodeExporter commented 8 years ago
Sorry, that would be 
http://code.google.com/p/googleappengine/issues/detail?id=1409 -- 
I forgot this is a different tracker. (And the relevance is likely that 
attempts to 
delete django from sys.modules are always unsafe.)

Original comment by guido@google.com on 4 Jun 2010 at 7:17

GoogleCodeExporter commented 8 years ago
Thanks for pointing 1409 out, Guido.

For the record, I'd rather see this patch use getattr instead of hasattr to 
check the value like in this changeset: 
http://code.google.com/p/dherbst-app-engine-django/source/detail?
r=cad9d77a103622c77d894a6b7b52ee2b2ff70fa7 if it is committed so there is no 
ambiguity of whether the 
imports have successfully completed.

Original comment by dherbst on 4 Jun 2010 at 7:27

GoogleCodeExporter commented 8 years ago
dbherbst: yep, I like your patch.  Is there anyone who can review and apply 
this?

Original comment by kumar.mcmillan on 4 Jun 2010 at 11:17

GoogleCodeExporter commented 8 years ago
I can review and apply, please ensure you've signed the CLA as described in the 
readme file.

Would this patch and issue still be relevant if the helpers import logic for 
Django was changed to only support use_library ? 

If we can address the underlying cause of the failures by switching to 
use_library, rather than simply papering over the symptoms then that would be 
preferable.

If you can demonstrate that we still need this logic even in the presence of 
use_library then I'm happy to accept the patch, although I'd prefer if all the 
logic to determine whether or not the helper has already been initialised would 
be contained within the InstallAppengineHelperForDjango method - with task 
queues and cron jobs main.py is often not the only entry point into the 
application, so the helper is invoked from multiple places. We don't want to 
have to duplicate this logic in all those files.

Original comment by m...@google.com on 8 Jun 2010 at 11:36

GoogleCodeExporter commented 8 years ago
Issue 155 has been merged into this issue.

Original comment by m...@google.com on 8 Jun 2010 at 11:39

GoogleCodeExporter commented 8 years ago
I signed the CLA in 2009.  I haven't seen this issue with use_library and I do 
not use a django zip file so I can't comment.  Perhaps Kumar can comment on 
this.

Original comment by dherbst on 8 Jun 2010 at 1:26

GoogleCodeExporter commented 8 years ago
The attached zip file, gae-django-reproduce-double-bug.zip, is a complete app 
that reproduces the bug when you upload it to an App Engine server.  When I get 
some free time I'll try applying use_library to that code but if someone beats 
me to it then they should be able to prove the theory and get an answer sooner. 
 I don't know how use_libary works but if it does not involve deleting django 
from sys.modules then it should solve the issue just fine.  Thanks for taking a 
look m...@google!

Original comment by kumar.mcmillan on 8 Jun 2010 at 6:47

GoogleCodeExporter commented 8 years ago
Hi all.  Sorry for the delayed response.  I'm proposing a new patch, attached 
here and explained below.

The first thing I did was fire up the example app that was attached to this 
ticket to reproduce the error.  The example app still reproduced the error but 
note it only exists on App Engine itself, not when running the app from the 
local SDK.

In response to m...@google.com's comment about converting to use_django() I am 
now confused by this.  I see that use_django() is already the method employed 
by the code, as you can see in the LoadDjango() method of 
appengine_django/__init__.py

So instead this new patch takes the approach that m...@google.com suggested 
which is to memoize InstallAppengineHelperForDjango() itself (not main.py) so 
it only runs once per process.  This will apply the fix for other entry points 
(cron, task queue, etc).

The patch fixes the repeated exception that can be seen in the example app and 
I also verified that all tests pass.  Can someone review this and perhaps get 
it applied to trunk?  I have already signed the CLA.

thanks!
-Kumar

Original comment by kumar.mcmillan on 30 Jul 2010 at 6:40

Attachments:

GoogleCodeExporter commented 8 years ago
Does any maintainer have a minute to review the latest patch?  Thanks, Kumar

Original comment by kumar.mcmillan on 21 Aug 2010 at 9:15

GoogleCodeExporter commented 8 years ago
Hi. Once again, could someone take a look and perhaps apply this patch?  It 
might help some people out.  I read this article about someone who ran into 
this and gave up, switching to Rackspace http://www.agmweb.ca/blog/andy/2286/  
i think my fix here should solve these odd deadline error states.

Original comment by kumar.mcmillan on 25 Oct 2010 at 4:18

GoogleCodeExporter commented 8 years ago
+1 to kumar's request. I've applied the patch he provided in comment 9 and it 
completely solved a whole class of errors that had dumbfounded me in a large, 
commercial AppEngine app. Including this patch in the official release would be 
a huge help, especially because it's very hard from the error messages to infer 
that this error is related to google-app-engine-django.

Original comment by shimonr...@gmail.com on 25 Oct 2010 at 4:39

GoogleCodeExporter commented 8 years ago
This issue was closed by revision r107.

Original comment by m...@google.com on 26 Oct 2010 at 12:29

GoogleCodeExporter commented 8 years ago
Apologies for the delay merging the patch. Thanks very much.

Original comment by m...@google.com on 26 Oct 2010 at 12:30

GoogleCodeExporter commented 8 years ago
thanks m...!

Original comment by kumar.mcmillan on 26 Oct 2010 at 4:51