USGCRP / gcis

Global Change Information System
https://data.globalchange.gov
Other
20 stars 16 forks source link

Stage returns internal server error #530

Closed amruelama closed 7 years ago

amruelama commented 7 years ago

Stage returns the following error quite often and when I load any page. But works when I reload the page.

500 Internal Server Error

When reporting this error, please include the information below :
    request url : https://data-stage.globalchange.gov/article/10.4319/lo.2010.55.6.2262
    referer : https://data-stage.globalchange.gov/journal/limnology-oceanography
    pid : 1480
    time : 1496260996
lomky commented 7 years ago

Seems to be an isolated set of errors occurring May 30 - May 31. Though there has been less activity in the couple days since then, so no guarantee it is actually resolved vs not yet reoccurred.

First Error:

[Tue May 30 16:09:32 2017] [debug] POST "/article/contributors/10.3402/gha.v6i0.21364"
[Tue May 30 16:09:32 2017] [debug] Routing to a callback
[Tue May 30 16:09:32 2017] [info] user: <snip>
[Tue May 30 16:09:32 2017] [debug] Routing to controller "Tuba::Article" and action "update_contributors"
[Tue May 30 16:09:32 2017] [debug] 302 Found (0.035927s, 27.834/s)
[Tue May 30 16:09:32 2017] [debug] GET "/article/form/update_contributors/10.3402/gha.v6i0.21364"
[Tue May 30 16:09:32 2017] [debug] Routing to a callback
[Tue May 30 16:09:32 2017] [info] user: <snip>
[Tue May 30 16:09:32 2017] [debug] Routing to controller "Tuba::Article" and action "update_contributors_form"
[Tue May 30 16:09:32 2017] [error] DBD::Pg::st execute failed: ERROR:  current transaction is aborted, commands ignored until end of transaction block a

[Tue May 30 16:09:32 2017] [debug] Rendering template "exception.production.html.ep"
[Tue May 30 16:09:32 2017] [debug] 500 Internal Server Error (0.010714s, 93.336/s)

I recently changed this functionality (update_contributors), so that's a big red flag.

lomky commented 7 years ago

Many instances of the update_contributors functionality were successful even minutes prior to the errors beginning.

Chain of events leading up to the first error, starting from the most recent listing page:

Navigating around - all 200 OK

 [Tue May 30 16:07:02 2017] [debug] GET "/organization"
 [Tue May 30 16:07:08 2017] [debug] GET "/organization/university-adelaide"
 [Tue May 30 16:07:15 2017] [debug] GET "/organization/university-adelaide-school-population-health"
 [Tue May 30 16:07:17 2017] [debug] GET "/organization/university-adelaide-discipline-public-health"
 [Tue May 30 16:07:31 2017] [debug] GET "/article/10.1093/ije/dyn165"
 [Tue May 30 16:07:57 2017] [debug] GET "/organization/university-adelaide-discipline-public-health"
 [Tue May 30 16:08:00 2017] [debug] GET "/organization/university-adelaide-school-population-health"
 [Tue May 30 16:08:02 2017] [debug] GET "/organization/university-adelaide-discipline-public-health"
 [Tue May 30 16:08:04 2017] [debug] GET "/organization/university-adelaide-school-population-health"
 [Tue May 30 16:08:05 2017] [debug] GET "/organization/university-adelaide"
 [Tue May 30 16:08:06 2017] [debug] GET "/organization/university-adelaide-school-public-health"
 [Tue May 30 16:08:22 2017] [debug] GET "/organization/university-adelaide"
 [Tue May 30 16:08:25 2017] [debug] GET "/organization/university-adelaide-school-population-health"
 [Tue May 30 16:08:33 2017] [debug] GET "/organization/university-adelaide-discipline-public-health"
 [Tue May 30 16:08:40 2017] [debug] GET "/organization/university-adelaide-school-public-health"

Loading Update Page - 200 OK

 [Tue May 30 16:08:42 2017] [debug] GET "/organization/form/update/university-adelaide-school-public-health"

Autocomplete x6 - 200 OK

 [Tue May 30 16:08:50 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:08:50 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:08:50 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:08:50 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:08:51 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:08:51 2017] [debug] GET "/autocomplete"

Organization Merge - 302 Found

 [Tue May 30 16:08:59 2017] [debug] POST "/organization/university-adelaide-school-public-health/merge"
 [Tue May 30 16:08:59 2017] [debug] Routing to controller "Tuba::Organization" and action "select"
 [Tue May 30 16:08:59 2017] [debug] Routing to controller "Tuba::Organization" and action "merge"
 [Tue May 30 16:08:59 2017] [debug] 302 Found (0.092114s, 10.856/s)

Loading Update Page - 200 OK

 [Tue May 30 16:08:59 2017] [debug] GET "/organization/form/update/university-adelaide-school-public-health"

Navigate around - 200 OK

 [Tue May 30 16:09:01 2017] [debug] GET "/organization/university-adelaide-school-public-health"

Autocomplete x8 - 200 OK

 [Tue May 30 16:09:23 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:23 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
 [Tue May 30 16:09:27 2017] [debug] GET "/autocomplete"

Update Article Contributors - 302 Found

 [Tue May 30 16:09:32 2017] [debug] POST "/article/contributors/10.3402/gha.v6i0.21364"
 [Tue May 30 16:09:32 2017] [debug] POST "/article/contributors/10.3402/gha.v6i0.21364"
 [Tue May 30 16:09:32 2017] [debug] Routing to a callback
 [Tue May 30 16:09:32 2017] [info] user: <snip>
 [Tue May 30 16:09:32 2017] [debug] Routing to controller "Tuba::Article" and action "update_contributors"
 [Tue May 30 16:09:32 2017] [debug] 302 Found (0.035927s, 27.834/s)

Loading Update Page - 500 Internal Server Error

[Tue May 30 16:09:32 2017] [debug] GET "/article/form/update_contributors/10.3402/gha.v6i0.21364"
[Tue May 30 16:09:32 2017] [debug] Routing to a callback
[Tue May 30 16:09:32 2017] [info] user: <snip>
[Tue May 30 16:09:32 2017] [debug] Routing to controller "Tuba::Article" and action "update_contributors_form"
[Tue May 30 16:09:32 2017] [error] DBD::Pg::st execute failed: ERROR:  current transaction is aborted, commands ignored until end of transaction block a

[Tue May 30 16:09:32 2017] [debug] Rendering template "exception.production.html.ep"
[Tue May 30 16:09:32 2017] [debug] 500 Internal Server Error (0.010714s, 93.336/s)

This isolates this particular problem to directly after an organization merge, so I will focus there since other organization update_contributors succeeded without issue. Strongly suspect the merge has a bug leaving open the transaction.

lomky commented 7 years ago

@amruelama have you had issues with organization merges in the past? Is it something you do often on organization objects?

amruelama commented 7 years ago

I haven't had merge issues that would return a 500 error. But I couldn't even open a stage URL (for those two days) say, a URL link from a spreadsheet because of this error. So I had to reload the page often to get it work.

lomky commented 7 years ago

@amruelama that matches the sort of problems I would expect from an unclosed transaction. Did the 500s stop after you did the content push?

amruelama commented 7 years ago

No, it didn't. I had an error while running commands for content push (./reset-to-stage). But it worked after I logged out and logged in my VPN. I believe, I saw the error until 31st evening.

lomky commented 7 years ago

For future reference & more command line practice, @amruelama, here is how you can see more details about a 500 error.

  1. As soon as you get the 500 error on a page, stay on that page & don't reload.*
  2. SSH to the stage backend as gcisops
    1. same ssh command as the content push process uses
  3. Get the most recent logs:
    1. tail -n 50 /var/local/log/tuba.log will show it on the terminal
    2. tail -n 50 /var/local/log/tuba.log >/tmp/20170602_recent_tuba_logs.log will save it for further viewing.

* This just keeps the log from getting more filled.

For example, after that first 500 you would have seen the logs like so:

[Tue May 30 16:08:59 2017] [debug] Rendering cached template "controls/autocomplete.html.ep"
[Tue May 30 16:08:59 2017] [debug] Rendering cached template "layouts/default.html.ep"
[Tue May 30 16:08:59 2017] [debug] Rendering cached template "menu.html.ep"
[Tue May 30 16:08:59 2017] [debug] 200 OK (0.125855s, 7.946/s)
[Tue May 30 16:09:01 2017] [debug] GET "/organization/university-adelaide-school-public-health"
[Tue May 30 16:09:01 2017] [debug] Routing to controller "Tuba::Organization" and action "show"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "organization/object.html.ep"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "edit_buttons.html.ep"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "history_modal.html.ep"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "h/other_identifiers.html.ep"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "h/other_formats.html.ep"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "layouts/default.html.ep"
[Tue May 30 16:09:01 2017] [debug] Rendering cached template "menu.html.ep"
[Tue May 30 16:09:01 2017] [debug] 200 OK (0.087537s, 11.424/s)
[Tue May 30 16:09:23 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:23 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:23 2017] [debug] 200 OK (0.013262s, 75.403/s)
[Tue May 30 16:09:23 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:23 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:23 2017] [debug] 200 OK (0.019089s, 52.386/s)
[Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:24 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:24 2017] [debug] 200 OK (0.017740s, 56.370/s)
[Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:24 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:24 2017] [debug] 200 OK (0.021527s, 46.453/s)
[Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:24 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:24 2017] [debug] 200 OK (0.047421s, 21.088/s)
[Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:24 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:24 2017] [debug] 200 OK (0.039717s, 25.178/s)
[Tue May 30 16:09:24 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:24 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:24 2017] [debug] 200 OK (0.058397s, 17.124/s)
[Tue May 30 16:09:27 2017] [debug] GET "/autocomplete"
[Tue May 30 16:09:27 2017] [debug] Routing to controller "Tuba::Search" and action "autocomplete"
[Tue May 30 16:09:27 2017] [debug] 200 OK (0.072106s, 13.868/s)
[Tue May 30 16:09:32 2017] [debug] POST "/article/contributors/10.3402/gha.v6i0.21364"
[Tue May 30 16:09:32 2017] [debug] Routing to a callback
[Tue May 30 16:09:32 2017] [info] user: <snip>
[Tue May 30 16:09:32 2017] [debug] Routing to controller "Tuba::Article" and action "update_contributors"
[Tue May 30 16:09:32 2017] [debug] 302 Found (0.035927s, 27.834/s)
[Tue May 30 16:09:32 2017] [debug] GET "/article/form/update_contributors/10.3402/gha.v6i0.21364"
[Tue May 30 16:09:32 2017] [debug] Routing to a callback
[Tue May 30 16:09:32 2017] [info] user: <snip>
[Tue May 30 16:09:32 2017] [debug] Routing to controller "Tuba::Article" and action "update_contributors_form"
[Tue May 30 16:09:32 2017] [error] DBD::Pg::st execute failed: ERROR:  current transaction is aborted, commands ignored until end of transaction block a

[Tue May 30 16:09:32 2017] [debug] Rendering template "exception.production.html.ep"
[Tue May 30 16:09:32 2017] [debug] 500 Internal Server Error (0.010714s, 93.336/s)

The most recent messages are at the bottom. Here you can examine the logs for what happened. Alternately, you can add them to the issue you file, which will help me out in finding the relevant issues. Notice I removed your email address from them first, though.

amruelama commented 7 years ago

@lomky Sounds good! I will log these if I see anymore errors

lomky commented 7 years ago

@amruelama My best theory is the merge process hit a bug in the code, leaving the update incomplete. From then on that incomplete state caused all the 500s. When you did the content push, that incomplete transaction is what caused the blip in your content push saying a user was logged in. Then you hit the bit in the content push that stopped & restarted tuba, which closed the transaction, allowing you to do the content push & resolving the initial problem.

Could you doublecheck some of the updates you did made it out to production? Especially any changes you did after you merged those organizations? I'll start digging on the bug in merge now.

amruelama commented 7 years ago

I spot checked a few and they're in the prod. So no problem I think.

lomky commented 7 years ago

@amruelama did the two orgs you were merging have any of the same contributors?

lomky commented 7 years ago

Almost certain this is failing due to a blind database reassign.
lib/Tuba/Organization.pm:merge

195     my $dbs = $c->dbs;
196     eval {
197       $dbs->begin_work;
198       $dbs->update('contributor',
199         { organization_identifier => $other->identifier},
200         { organization_identifier => $org->identifier })
201         or die $dbs->errstr;
202       $org->delete(audit_user => $c->audit_user, audit_note => $c->audit_note);
203       $dbs->commit or die $dbs->error;
204     };
205     if ($@) { return $c->update_error($@) }

contributor has a unique requirement for person+role+org. If one already exists this will die and be caught, but we are relying on the (notoriously) iffy if($@) eval error catch - which is failing. Probably due to DB object magic inside the eval (see here).

Work Needed:
Recreate the failure in dev.
A solution with Transactional contributor reassign & organization object deletion; proper failure.
Tests for Organization.

lomky commented 7 years ago

Closing this investigation ticket. Work will be done in #531