databricks / spark-pr-dashboard

Dashboard to aid in Spark pull request reviews
spark-prs.appspot.com
Apache License 2.0
54 stars 46 forks source link

How do you debug a local install that's loading no issues? #20

Closed nchammas closed 10 years ago

nchammas commented 10 years ago

I've installed the dashboard according to the README's instructions and have a dashboard running locally. It appears to have trouble loading any issues, though. All the tabs show 0 issues.

I generated a personal access token and used that in my settings.cfg. How I can tell that it's working correctly? More generally, how do I debug issues like this? The app is giving no warnings or errors when I run dev_appserver.py ..

JoshRosen commented 10 years ago

This is because the dev appserver doesn't run cron jobs. Browse to http://localhost:8000/cron and hit "Run now" to manually run the task that updates issues, or browse to http://localhost:8080/tasks/update-issues/

nchammas commented 10 years ago

Ah, OK, that did it! Perhaps we should add this step to the README?

Also, it seems that there are some differences between the issues shown in the local instance and the ones shown in production. This is probably due to how issues are cached, right?

JoshRosen commented 10 years ago

Yeah, we should probably note that. I've been the only person really working on this so far, so I'm sure there are other small things like this that I've neglected to document.

What sorts of differences are you seeing between the local and production issues (beyond simply having fewer of them in your local install, since the bulk-loading is performed slowly to avoid exhausting your API quota)?

nchammas commented 10 years ago

Seems to be just the number of issues, though the local instance is slowly catching up. I see a steady stream of calls to POST /tasks/update-issue/NNNN. I guess this is the app working through the task queue which in turn got filled by the initial triggering of the cron job?

JoshRosen commented 10 years ago

Yep. The task queues give us rate-limiting, automatic retry, and parallelism.

In b5b44b897fa3557ebb9f591cd5a68ff5202fbb17, I modified the app to use two queues, a high-rate queue for loading recent issues and a low-rate queue for background bulk-loading. If you run the cron task now, you should see a series of fast requests to refresh the active issues.

nchammas commented 10 years ago

Yep, I see the two queues, fresh-prs and old-prs, but after manually purging them once it appears that new executions of the cron job don't repopulate the queues. Hmm. Anyway, I'll keep playing around. Thanks for explaining the setup!

JoshRosen commented 10 years ago

That's because the cron job stores a low watermark so that subsequent jobs only request issues that have been updated since that watermark. This helps to reduce the number of calls we make to GitHub. The watermark is stored in a datastore class called KVS, which is just a custom dumping-ground for ad-hoc data like counters that we need to persist across app restarts.

Since we don't have an admin panel yet, the easiest way to reset this watermark and fetch all issues is to browse to http://localhost:8000/datastore?kind=KVS and delete the issues_since key from the KVS table.

nchammas commented 10 years ago

Alright, tried that and it worked as described. Thanks again Josh!