OpenTreeOfLife / phylesystem-api

API access to Open Tree of Life treestore
BSD 2-Clause "Simplified" License
10 stars 5 forks source link

Lag time between "local" and GitHub phylesystem repos is very confusing #96

Open jimallman opened 10 years ago

jimallman commented 10 years ago

We're getting confused reports from curators on tree.opentreeoflife.org because their studies aren't showing up in the study list. @mtholder , any idea why the lag time for phylesystem-1 (on tree.opentreeoflife.org) would be this long?

admin@ashby:/home/opentree/repo/phylesystem-1_par/phylesystem-1$ git status
# On branch master
# Your branch is ahead of 'origin/master' by 164 commits.
#
nothing to commit (working directory clean)

The mirror repo shows the same last commit in git log, but it doesn't show how far behind it is from the remote on GitHub (I'm guessing that's expected behavior):

admin@ashby:/home/opentree/repo/phylesystem-1_par/mirror/phylesystem-1$ git status
# On branch master
nothing to commit (working directory clean)

I don't see any unresolved conflicts here, so I'm not sure where else to look to explain this lag. Thoughts?

mtholder commented 10 years ago

The working repo does not pull so it will always report that it is behind.

Clock skew on the server running phylesystem can cause the times to be off on according to GitHub and could cause the nudge to OTI to be late, perhaps.

Do we know if the lag is between the curator's save and the push to GitHub?

There is a deferred call through celery (which should log its actions). That could cause a lag - but it should be tiny lag.

Sorry that I can't dig in to this now, I'm on the road (again. cue Willie Nelson).

Mark

On Jul 21, 2014 1:45 PM, "Jim Allman" notifications@github.com wrote:

We're getting confused reports from curators on tree.opentreeoflife.org because their studies aren't showing up in the study list. @mtholder https://github.com/mtholder , any idea why the lag time for phylesystem-1 (on tree.opentreeoflife.org http://tree.opentreeoflife.org) would be this long?

admin@ashby:/home/opentree/repo/phylesystem-1_par/phylesystem-1$ git status# On branch master# Your branch is ahead of 'origin/master' by 164 commits.# nothing to commit (working directory clean)

The mirror repo shows the same last commit in git log, but it doesn't show how far behind it is from the remote on GitHub (I'm guessing that's expected behavior):

admin@ashby:/home/opentree/repo/phylesystem-1_par/mirror/phylesystem-1$ git status# On branch master nothing to commit (working directory clean)

I don't see any unresolved conflicts here, so I'm not sure where else to look to explain this lag. Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/phylesystem-api/issues/96.

jimallman commented 10 years ago

On Jul 21, 2014, at 6:32 PM, Mark T. Holder notifications@github.com wrote:

Clock skew on the server running phylesystem can cause the times to be off on according to GitHub and could cause the nudge to OTI to be late, perhaps.

Thnanks! I’ll check the clocks and celery logs.

I’ll also review the webhooks on GitHub, to make sure they’re using the correct (latest) API methods to nudge oti. I noticed some are calling with old-school URLs.

jimallman commented 10 years ago

Flagging this as related to https://github.com/OpenTreeOfLife/deployed-systems/pull/7

jimallman commented 10 years ago

@mtholder, please see https://github.com/OpenTreeOfLife/deployed-systems/pull/7 for the likely culprit (wrong key file specified during deployment). Now that I know what to look for, I can see lots of 409 CONFLICT responses in the apache logs (look for push/ in ot15:/var/log/apache2/access.log).

We should make this error more visible. I'm not quite sure where to make that happen, since it looks like we're using deferred actions for this.

jimallman commented 10 years ago

UPDATE: We also had a minor problem with the webhook URL that nudges oti when studies are added/changed/deleted. Addressed in this fix, which builds a URL that matches the current phylesystem-api.

Note that this is still a very weird URL, along the lines of http://api.opentreeoflife.org/phylesystem/v1/../search/nudgeIndexOnUpdates Moving this into our preferred form, something like http://api.opentreeoflife.org/phylesystem/v1/search/nudgeIndexOnUpdates will require some shenanigans in web2py to re-route across controllers. This is generally not recommended, so we might need to move methods from the search, merge, and other oddball controllers into the main controller. Or maybe add these as proxied paths in Apache?

mtholder commented 10 years ago

wrt making the error more visible: we probably want to add a system for logging errors on the server side in such a way as to make them fetchable via http. In this case, the except block in the controller/push.py function v1/PUT should probably just append a note to some JSON file on the server side, and then we should have a default/v1/error_status function that would return the error.

We could have a convention like: private/errors/phylesystem-api.json for errors that reflect problems with the whole system, and private/errors/<study-id>.json for errors associated with a specific study.

If the error files were intended for intermittent weirdness (like this case) rather than problems with the NexSONs, then we would not need to store them long term or put them under version control. The client code could just use a timer attached the save action (or just poll) to make sure that the err state was clean for a study after the save operation.

jimallman commented 10 years ago

Added a related ticket (#106) about lots of accumulated changes sitting in the local repo (not making it to GitHub).