SafeSlingerProject / SafeSlinger-AppEngine

Source code for App Engine platform server SafeSlinger
MIT License
9 stars 4 forks source link

"IndexError: list index out of range" on new instance with Python 2.7 #41

Closed dschuermann closed 7 years ago

dschuermann commented 7 years ago

We changed the safeslinger server used in OpenKeychain from an old Python 2.5 instance to a completely new Python 2.7 instance based on the last commit here (see https://github.com/open-keychain/SafeSlinger-AppEngine and https://github.com/open-keychain/open-keychain/commit/74c5197bc6f0aa96619066d31b699c173d06ea20). Back then I remember testing it, but now it no longer works. OpenKeychain says "Server HTTP Error: 500 'Internal Server Error'.

In https://console.cloud.google.com I can see the stacktrace:

IndexError: list index out of range
at choice (/base/data/home/runtimes/python27/python27_dist/lib/python2.7/random.py:274)
at post (/base/data/home/apps/e~slinger-openpgp/20170222t160026.399371735374003269/assignUser.py:115)
at dispatch (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py:545)
at dispatch (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py:547)
at __call__ (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py:1077)
at default_dispatcher (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py:1253)
at __call__ (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py:1505)
at __call__ (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py:1511)
mwfarb commented 7 years ago

We've been running all our GAE on 2.7 for over a year now but I have not recently tried to deploy a new GAE app instance. Just now I did some tests with our iOS and Android demo apps which would allow me to test against "slinger-openpgp", "safeslinger-openpgp", as well as our test "slinger-dev". Let me try to deploy a new server app instance today and see if I can duplicate the results to get to the bottom of it.

Is the issue persistent or sporadic?

dschuermann commented 7 years ago

Issue is persistent for me with the current OpenKeychain 4.3. BTW: We currently use https://github.com/open-keychain/exchange-android inside OpenKeychain.

mwfarb commented 7 years ago

OK, it's clear our README needs some updates. :-)

I just tried removing the first 2 lines of the app.yml similar to your changes in your fork from https://github.com/SafeSlingerProject/SafeSlinger-AppEngine/tree/master/safeslinger-exchange/python. I then created a new cloud project id from the console called 'okc-test-2'. In the App Engine menu from https://console.cloud.google.com/appengine I selected a Python project and chose the Europe domain. I declined to run the tutorial.

From my local console project directory I deployed the app with: appcfg.py update . -A okc-test-2 -V 01060000

This seems to work with the SafeSlinger Android lib. Are we using the same methods to create a project and deploy it?

It might be better to leave the line version: 01060000t or version: 01060000 should be sufficient in the app.yaml file since we use it for versioning backward compatibility and version reporting between client and server. Regardless, your server made it past that version check.

If we are deploying differently perhaps try appcfg.py update . -A slinger-openpgp -V 01060000 and migrate traffic to it and see if the issue still exists?

mwfarb commented 7 years ago

BTW, the slinger-openpgp.appspot.com domain fails consistently for me with our demo Android app and your your latest Play store OKC loaded on my phone. So hopefully we can resolve the server issue without any Android client changes.

If the above suggestions don't work. Perhaps I could examine your slinger-openpgp instance if you give me read only permission to my gmail account, mwfarb, temporarily. I could then compare any differences in settings. I could also give you permissions to our test instances if you prefer to do a similar comparison.

mwfarb commented 7 years ago

It looks like the cron job did not get started when the app was deployed. When I deployed a test app with the new recommended method, gcloud app deploy, the cron job should be running as soon as the deployment finished but did not using the gcloud command. The cron job cleans up old key exchange messages every 10 minutes, and without it the old entries persist and will eventually clog the database.

This would explain why the exchange would work for weeks or months end then fail when it gets too big. I'm imagining the datastore view at https://console.cloud.google.com/datastore/entities/query?project=slinger-openpgp&ns=&kind=Member would show entries several days old when they should only be several minutes old.

I was able to fix my test deployment by specifying the deployment config files in the command line and the cron job started after a few minutes. For some reason the standard deploy command above ignores the cron.yaml file. The old appcfg.py commands would by default deploy all local *.yaml files. This deploy command should work for you: gcloud app deploy --project slinger-openpgp app.yaml index.yaml cron.yaml

I tried this command for you but my role complains about not having bucket object permissions: Storage Object Admin. I didn't need bucket storage in my test app, but you may have configured your app that way.

I now have commitments which will take me away from this for ~10 hours. If you can try the deployment command in the interim or have a proxy who can try it for you it would be helpful. Otherwise adding the above permission for me might allow me to deploy it later on.

dschuermann commented 7 years ago

Awesome analysis, it worked! The instance is now cleared of all the entries and the cron job seems to be running!

mwfarb commented 7 years ago

Also, in the short term, someone with edit permissions could manually delete all the Member rows from the database at https://console.cloud.google.com/datastore/entities/query?project=slinger-openpgp&ns=&kind=Member. Which would restore short term functionality until redeployment.

mwfarb commented 7 years ago

Awesome! Next up: documentation review. :+1: