Closed sraustein closed 8 years ago
error.log.gz apache error log Trac attachment by randy on 2016-07-18T10:45:01Z
{{{ sra@thrintun> host vmini.rpki.net vmini.rpki.net has address 10.10.0.242 }}}
Trac comment by sra on 2014-02-20T13:46:09Z
yes, we are natted here. sorry.
Trac comment by randy on 2014-02-20T14:11:20Z
yes, we are natted here. sorry.
Understood, no blame. Just: If nobody can get in to help you debug, you're on your own. Sorry.
Most useful thing I can think of in terms of preserving evidence for later analysis would be a dump of all the SQL data. In theory, running:
{{{ ca0.vmini.rpki.net# rpki-sql-backup >/root/wtf.sql }}}
should do it, haven't tested recently, if that fails there's always
running mysqldump
manually.
Secondary evidence you might want to preserve if you can are the publication tree from ca0.vmini (/usr/share/rpki/publication on Ubuntu, IIRC) and the /var/rcynic tree from cache0.vmini. In theory, everything relevant in those is already in the SQL data, but....
Trac comment by sra on 2014-02-20T14:22:35Z
daytime tomorrow when i can get to the box
we should do a reverse tunnel hack for this use
Trac comment by randy on 2014-02-20T14:25:30Z
we should do a reverse tunnel hack for this use
Probably can piggy-back on existing BGP tunnel hack.
Trac comment by sra on 2014-02-20T14:45:28Z
On 02/20/2014 01:33 AM, Trac Ticket System wrote:
681: labuser05 blows chunks
--------------------+------------------ Reporter: randy | Owner: Type: defect | Status: new Priority: blocker | Component: gui Keywords: | Blocked By: Blocking: | --------------------+------------------ during workshop, labuser05 had no resources. if i tried to add, server error. so i deleted labuer05 but there seemed to be no way to create a user from workshop account
You should be able to recreate labuser05 by doing the following (as a superuser logged into the GUI):
Now the rgnet resource holder can delegate some resources to labuser05.
Trac comment by melkins on 2014-02-20T16:56:20Z
ca0.vmini.rpki.net# rpki-sql-backup >/root/wtf.sql
http://archive.psg.com/wtf.sql.gz
Secondary evidence you might want to preserve if you can are the publication tree from ca0.vmini
it publishes on ca0.rpki.net
randy
Trac comment by randy on 2014-02-21T02:43:41Z
{{{ [Thu Feb 20 09:32:17 2014] [error] ERROR 2014-02-20 09:32:17,862 django.request Internal Server Error: /rpki/refresh [Thu Feb 20 09:32:17 2014] [error] Traceback (most recent call last): [Thu Feb 20 09:32:17 2014] [error] File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response [Thu Feb 20 09:32:17 2014] [error] response = callback(request, _callback_args, _callback_kwargs) [Thu Feb 20 09:32:17 2014] [error] File "/usr/lib/python2.7/dist-packages/django/contrib/auth/decorators.py", line 25, in _wrapped_view [Thu Feb 20 09:32:17 2014] [error] return view_func(request, _args, _kwargs) [Thu Feb 20 09:32:17 2014] [error] File "/usr/lib/python2.7/dist-packages/rpki/gui/decorators.py", line 30, in _tls_required [Thu Feb 20 09:32:17 2014] [error] return f(request, _args, _kwargs) [Thu Feb 20 09:32:17 2014] [error] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/views.py", line 94, in wrapped_fn [Thu Feb 20 09:32:17 2014] [error] return f(request, _args, _kwargs) [Thu Feb 20 09:32:17 2014] [error] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/views.py", line 934, in refresh [Thu Feb 20 09:32:17 2014] [error] request.session['handle']) [Thu Feb 20 09:32:17 2014] [error] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/glue.py", line 83, in list_received_resources [Thu Feb 20 09:32:17 2014] [error] for pdu in pdus: [Thu Feb 20 09:32:17 2014] [error] TypeError: 'NoneType' object is not iterable }}}
In rpki.gui.app.glue.list_received_resources, the call to Zookeeper.call_rpkid() is apparently returning None, rather than a list of result pdus: {{{ z = Zookeeper(handle=conf.handle) pdus = z.call_rpkid(list_received_resources_elt.make_pdu(self_handle=conf.handle)) }}}
Zookeeper.call_rpkid() itself just a wrapper:
{{{
call_rpkid = rpki.async.sync_wrapper(rpki.http.caller(
proto = rpki.left_right,
client_key = irbe.private_key,
client_cert = irbe.certificate,
server_ta = self.server_ca.certificate,
server_cert = rpkid.certificate,
url = url,
debug = self.show_xml))
return call_rpkid(*pdus)
}}}
rpki.async.sync_wrapper will potentially raise an exception on error, otherwise it returns the result of the callback, which in this case must really be None: {{{ def call(self, _args, *_kwargs):
def thunk():
try:
self.func(self.cb, self.eb, *args, **kwargs)
except ExitNow:
raise
except Exception, e:
self.eb(e)
event_defer(thunk)
event_loop()
if self.err is None:
return self.res
elif isinstance(self.err, tuple):
raise self.err[0], self.err[1], self.err[2]
else:
raise self.err
}}}
I'm a little stumped as to why this is happening. Currently there is no error check in my code prior to removing the existing resources, which is definitely a bug: {{{ z = Zookeeper(handle=conf.handle) pdus = z.call_rpkid(list_received_resources_elt.make_pdu(self_handle=conf.handle))
models.ResourceCert.objects.filter(conf=conf).delete()
}}}
I need to ensure that pdus is not None and that there are no error pdus present.
Trac comment by melkins on 2014-03-10T17:09:47Z
An alternative would be to decorate rpki.gui.app.glue.list_received_resources() with @commit_on_success, so that any changes would be rolled back if an exception is raised. This may be the cleaner solution.
Trac comment by melkins on 2014-03-10T17:20:14Z
In [changeset:"5700"]: {{{
Rollback deletion of cached resource certs when there is an error communicating with rpkid. see #681 }}}
Trac comment by melkins on 2014-03-10T17:55:00Z
will not be able to test soon, still in transit. but thanks for working this one.
Trac comment by randy on 2014-03-10T19:15:12Z
rpki.async.sync_wrapper will potentially raise an exception on error, otherwise it returns the result of the callback, which in this case must really be None: ... I'm a little stumped as to why this is happening.
Likewise. Empty sequence would be plausible, but None is unexpected.
Trac comment by sra on 2014-03-10T19:30:23Z
ok, vmini is on net on bainbridge
Trac comment by randy on 2014-03-15T01:13:54Z
On Sat, Mar 15, 2014 at 01:13:54AM -0000, Trac Ticket System wrote:
ok, vmini is on net on bainbridge
I can ssh to ca0.vmini, but the https service does not seem to be forwarded to that virtual machine?
Trac comment by melkins on 2014-03-18T17:19:05Z
I can ssh to ca0.vmini, but the https service does not seem to be forwarded to that virtual machine?
have you tried https://vmini.rg.net?
Trac comment by randy on 2014-03-18T17:27:02Z
On Tue, Mar 18, 2014 at 05:27:02PM -0000, Trac Ticket System wrote:
have you tried https://vmini.rg.net?
Firefox can't find the server at vmini.rg.net.
Trac comment by melkins on 2014-03-18T17:31:56Z
On Tue, Mar 18, 2014 at 05:31:56PM -0000, Trac Ticket System wrote:
On Tue, Mar 18, 2014 at 05:27:02PM -0000, Trac Ticket System wrote:
have you tried https://vmini.rg.net?
Firefox can't find the server at vmini.rg.net.
$ host vmini.rpki.net vmini.rpki.net has address 50.135.235.34 $ host vmini.rg.net Host vmini.rg.net not found: 3(NXDOMAIN) $ host ca0.vmini.rpki.net ca0.vmini.rpki.net has address 50.135.235.34
Trac comment by melkins on 2014-03-18T17:35:59Z
Firefox can't find the server at vmini.rg.net.
can't find server, as in ip addy? or can't find 443 service?
xinitd on vmini would seem to redirect properly
service ca0-https { flags = NORETRY type = UNLISTED socket_type = stream protocol = tcp port = 443 wait = no user = root redirect = 10.0.0.2 443 }
i am stuck 'inside' so can not properly test. and rob is dealing with far more serious matters.
lemme see if i can proxy, tunnel, or something.
Trac comment by randy on 2014-03-18T17:36:09Z
oh sigh. so try https://vmini.rpki.net/
Trac comment by randy on 2014-03-18T17:37:22Z
On Tue, Mar 18, 2014 at 05:36:10PM -0000, Trac Ticket System wrote:
Comment (by randy):
Firefox can't find the server at vmini.rg.net.
can't find server, as in ip addy? or can't find 443 service?
vmini.rg.net gets a NXDOMAIN. vmini.rpki.net and ca0.vmini.rpki.net get connection refused:
$ telnet ca0.vmini.rpki.net 443 Trying 50.135.235.34... telnet: Unable to connect to remote host: Connection refused
i am stuck 'inside' so can not properly test. and rob is dealing with far more serious matters.
lemme see if i can proxy, tunnel, or something.
OK, not urgent.
Trac comment by melkins on 2014-03-18T17:38:14Z
and rob is dealing with far more serious matters.
Putting the rear wheel back on my mother's bicycle, yeah.
Trac comment by sra on 2014-03-18T18:22:46Z
tcptraceroute from ashburn says that the https port is blocked. At a guess, by your edge router in bainbridge or whatever.
{{{ Tracing the path to ca0.vmini.rpki.net (50.135.235.34) on TCP port 443, 30 hops max 1 r0.iad.rg.net (198.180.150.120) 9.825 ms 9.484 ms 9.979 ms 2 r1.iad.rg.net (198.180.150.121) 9.966 ms 9.611 ms 9.980 ms 3 144.232.18.81 (144.232.18.81) 9.983 ms 9.682 ms 9.981 ms 4 sl-crs1-dc-0-6-0-2.sprintlink.net (144.232.20.135) 9.985 ms 9.621 ms 9.982 ms 5 sl-st30-ash-0-10-0-0.sprintlink.net (144.232.25.224) 10.007 ms 9.126 ms 9.979 ms 6 144.232.11.92 (144.232.11.92) 9.986 ms 9.334 ms 9.983 ms 7 144.232.6.98 (144.232.6.98) 9.984 ms 9.685 ms 9.981 ms 8 be-26-cr01.ashburn.va.ibone.comcast.net (68.86.82.169) 9.984 ms 9.969 ms 9.973 ms 9 he-4-6-0-0-cr01.56marietta.ga.ibone.comcast.net (68.86.89.162) 19.403 ms 19.476 ms 19.947 ms 10 68.86.88.238 (68.86.88.238) 39.585 ms 39.584 ms 39.957 ms 11 he-0-13-0-0-cr01.losangeles.ca.ibone.comcast.net (68.86.85.145) 69.946 ms 71.555 ms 72.071 ms 12 he-2-12-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.89.217) 81.804 ms 83.513 ms 83.937 ms 13 68.86.91.78 (68.86.91.78) 78.684 ms 78.734 ms 78.689 ms 14 * * * 15 te-1-0-0-ten03.bremerton.wa.seattle.comcast.net (68.86.98.122) 80.206 ms 80.276 ms 80.191 ms 16 c-50-135-235-34.hsd1.wa.comcast.net (50.135.235.34) [closed] 91.662 ms 91.734 ms 93.920 ms }}}
If you ssh to vmini.rpki.net, w3m https://ca0.vmini.rpki.net/
works,
netstat --inet -a
shows that we're listening to the correct port,
and w3m https://localhost/
suggests that the xinetd connection
forwarding is active, although the hostname mismatch will leave things
hopelessly confused in the last case.
Trac comment by sra on 2014-03-18T18:31:23Z
please try again
Trac comment by randy on 2014-03-18T23:28:14Z
On Tue, Mar 18, 2014 at 11:28:14PM -0000, Trac Ticket System wrote:
please try again
Now port 443 is open, but I get a HTTP 301 redirect back to port 80, which gets me connection refused.
Trac comment by melkins on 2014-03-19T18:07:04Z
sigh. again, please
Trac comment by randy on 2014-03-19T18:09:41Z
sigh. again, please
no dice:
$ telnet cache0.vmini.rpki.net 80 Trying 50.135.235.34... telnet: Unable to connect to remote host: Connection refused
Trac comment by melkins on 2014-03-19T18:10:56Z
Now port 443 is open, but I get a HTTP 301 redirect back to port 80, which gets me connection refused.
I suspect wrong DNS name.
https://cache0.vmini.rpki.net/ redirects to http://cache0.vmini.rpki.net/
http://ca0.vmini.rpki.net/ redirects to https://ca0.vmini.rpki.net/
This is all much more complex than any of us would have liked, but the root cause is the need to make this all work on vmini's one and only external IP address when it's in a lab environment.
Trac comment by sra on 2014-03-19T18:45:50Z
On 03/19/2014 11:45 AM, Trac Ticket System wrote:
Now port 443 is open, but I get a HTTP 301 redirect back to port 80, which gets me connection refused.
I suspect wrong DNS name.
https://cache0.vmini.rpki.net/ redirects to http://cache0.vmini.rpki.net/
http://ca0.vmini.rpki.net/ redirects to https://ca0.vmini.rpki.net/
Good catch, I should have been hitting ca0.vmini and the redirection for that site does work.
Trac comment by melkins on 2014-03-19T18:47:52Z
vmini is soon to pack up to go from bainbridge to tokyo. back on your friday or so.
Trac comment by randy on 2014-03-20T00:12:45Z
vmini is up back home in tokyo
Trac comment by randy on 2014-03-22T12:13:05Z
i just realized that melkins may not know the hack to get into *.vmini when it is behind my home gateway
in ~/.ssh/conf {{{ Host vmini* Hostname vmini.rpki.net Port 42022 ControlMaster no
Host ca0.vmini* Hostname vmini.rpki.net Port 42023 ControlMaster no
Host cache0.vmini* Hostname vmini.rpki.net Port 42024 ControlMaster no }}}
Trac comment by randy on 2014-04-08T23:43:08Z
Replying to [comment:9 melkins]:
In rpki.gui.app.glue.list_received_resources, the call to Zookeeper.call_rpkid() is apparently returning None, rather than a list of result pdus: ... I'm a little stumped as to why this is happening.
Likewise. The wrapper code looks pretty tight, although of course I could be missing something.
The only path I could see that might cause this would be if your process were to receive a SIGHUP or SIGTERM while waiting for the answer from rpkid.
Guess I could add something to the sync_wrapper class to record whether either of the exit callback methods was ever invoked.
Trac comment by sra on 2014-07-11T04:29:31Z
In [changeset:"5890"]: {{{
Add check to log a warning if sync_wrapper returns without callback. See #681. }}}
Trac comment by sra on 2014-07-11T16:59:28Z
THIS PROBLEM PERSISTS AND BLEW A WORKSHOP LAB
labuser05 had no resources. tried to export workshop's prefixes, manually add labuser05, and re-import. internal server error.
Trac comment by randy on 2014-11-19T10:35:07Z
Replying to [comment:35 randy]:
THIS PROBLEM PERSISTS AND BLEW A WORKSHOP LAB
labuser05 had no resources. tried to export workshop's prefixes, manually add labuser05, and re-import. internal server error.
This looks like a combination of the bug for this ticket and other bugs (see #734 #735 #736).
These log messages indicate that the problem I reported earlier in this ticket are still present. Namely, Zookeeper.call_rpkid() with a list_received_resources
PDU sometimes returns None unexpectedly, rather than the list of resources. This no long causes an exception to be thrown, but it does mean the GUI has no idea what resources a holder actually has, and falls back on data cached in the django DB.
{{{
[Wed Nov 19 09:11:31.130046 2014] [:error] [pid 12838:tid 2962180928] [client 10.0.0.1:34363] error: call_rpkid() returned None for handle labuser08 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 09:12:04.399980 2014] [:error] [pid 12838:tid 2886646592] [client 10.0.0.1:34382] error: call_rpkid() returned None for handle labuser05 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 09:15:15.965334 2014] [:error] [pid 12837:tid 3046107968] [client 10.0.0.1:34490] error: call_rpkid() returned None for handle labuser03 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 09:16:10.760989 2014] [:error] [pid 12837:tid 2903432000] [client 10.0.0.1:34521] error: call_rpkid() returned None for handle labuser06 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 09:16:47.372733 2014] [:error] [pid 12837:tid 2970573632] [client 10.0.0.1:34540] error: call_rpkid() returned None for handle labuser11 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:22:07.714532 2014] [:error] [pid 12837:tid 3020929856] [client 10.0.0.1:34675] error: call_rpkid() returned None for handle labuser05 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:24:45.936055 2014] [:error] [pid 12837:tid 3012537152] [client 10.0.0.1:34691] error: call_rpkid() returned None for handle labuser13 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:31:43.798281 2014] [:error] [pid 12837:tid 2895039296] [client 10.0.0.1:34743] error: call_rpkid() returned None for handle labuser16 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:31:47.432379 2014] [:error] [pid 12837:tid 2895039296] [client 10.0.0.1:34743] error: call_rpkid() returned None for handle labuser16 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:47:31.325763 2014] [:error] [pid 12837:tid 2895039296] [client 10.0.0.1:34905] error: call_rpkid() returned None for handle labuser08 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:47:35.243230 2014] [:error] [pid 12837:tid 2895039296] [client 10.0.0.1:34905] error: call_rpkid() returned None for handle labuser08 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:47:37.189831 2014] [:error] [pid 12837:tid 2895039296] [client 10.0.0.1:34905] error: call_rpkid() returned None for handle labuser08 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:51:07.512430 2014] [:error] [pid 12837:tid 2945395520] [client 10.0.0.1:34941] error: call_rpkid() returned None for handle labuser04 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:52:24.503128 2014] [:error] [pid 12837:tid 2886646592] [client 10.0.0.1:34958] error: call_rpkid() returned None for handle labuser04 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 10:52:26.458218 2014] [:error] [pid 12837:tid 2886646592] [client 10.0.0.1:34958] error: call_rpkid() returned None for handle labuser04 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
[Wed Nov 19 11:06:00.472249 2014] [:error] [pid 12837:tid 3020929856] [client 10.0.0.1:34984] error: call_rpkid() returned None for handle labuser10 when fetching received resources, referer: https://ca0.vmini.rpki.net/rpki/
}}}
I am unable to trigger the problem with irbe_client
, as it seems to always return data.
Looking at rpki.async.sync_wrapper
, I should probably enable logging for rpki.async
to see if rpki.async.sync_wrapper.cb()
is ever invoked. Maybe that will give me some additional clue.
The only other error messages I'm seeing in the apache log are mod_wsgi reporting that it is ignoring rpki.async's request to trip some UNIX signals:
{{{ [Wed Nov 19 05:29:01.290630 2014] [:warn] [pid 12837:tid 3020929856] mod_wsgi (pid=12837): Callback registration for signal 2 ignored. [Wed Nov 19 05:29:01.313821 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/core/handlers/wsgi.py", line 206, in call [Wed Nov 19 05:29:01.313861 2014] [:warn] [pid 12837:tid 3020929856] response = self.get_response(request) [Wed Nov 19 05:29:01.313880 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 112, in get_response [Wed Nov 19 05:29:01.313895 2014] [:warn] [pid 12837:tid 3020929856] response = wrapped_callback(request, _callback_args, _callback_kwargs) [Wed Nov 19 05:29:01.313914 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view [Wed Nov 19 05:29:01.313930 2014] [:warn] [pid 12837:tid 3020929856] return view_func(request, _args, _kwargs) [Wed Nov 19 05:29:01.313948 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/decorators.py", line 30, in _tls_required [Wed Nov 19 05:29:01.313964 2014] [:warn] [pid 12837:tid 3020929856] return f(request, _args, _kwargs) [Wed Nov 19 05:29:01.313981 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/views.py", line 104, in wrapped_fn [Wed Nov 19 05:29:01.313996 2014] [:warn] [pid 12837:tid 3020929856] return f(request, _args, _kwargs) [Wed Nov 19 05:29:01.314013 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/views.py", line 930, in refresh [Wed Nov 19 05:29:01.314029 2014] [:warn] [pid 12837:tid 3020929856] glue.list_received_resources(request.META['wsgi.errors'], conf) [Wed Nov 19 05:29:01.314045 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 399, in inner [Wed Nov 19 05:29:01.314060 2014] [:warn] [pid 12837:tid 3020929856] return func(_args, _kwargs) [Wed Nov 19 05:29:01.314077 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/glue.py", line 93, in list_received_resources [Wed Nov 19 05:29:01.314093 2014] [:warn] [pid 12837:tid 3020929856] pdus = z.call_rpkid(list_received_resources_elt.make_pdu(self_handle=conf.handle)) [Wed Nov 19 05:29:01.314110 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/irdb/zookeeper.py", line 1071, in call_rpkid [Wed Nov 19 05:29:01.314125 2014] [:warn] [pid 12837:tid 3020929856] return call_rpkid(pdus) [Wed Nov 19 05:29:01.314142 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/async.py", line 377, in call [Wed Nov 19 05:29:01.314157 2014] [:warn] [pid 12837:tid 3020929856] event_loop() [Wed Nov 19 05:29:01.314174 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/async.py", line 290, in event_loop [Wed Nov 19 05:29:01.314189 2014] [:warn] [pid 12837:tid 3020929856] old = signal.signal(sig, _raiseExitNow) [Wed Nov 19 05:29:01.314217 2014] [:warn] [pid 12837:tid 3020929856] mod_wsgi (pid=12837): Callback registration for signal 15 ignored. [Wed Nov 19 05:29:01.314487 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/core/handlers/wsgi.py", line 206, in call [Wed Nov 19 05:29:01.314543 2014] [:warn] [pid 12837:tid 3020929856] response = self.get_response(request) [Wed Nov 19 05:29:01.314560 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 112, in get_response [Wed Nov 19 05:29:01.314575 2014] [:warn] [pid 12837:tid 3020929856] response = wrapped_callback(request, _callback_args, _callback_kwargs) [Wed Nov 19 05:29:01.314590 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view [Wed Nov 19 05:29:01.314605 2014] [:warn] [pid 12837:tid 3020929856] return view_func(request, _args, _kwargs) [Wed Nov 19 05:29:01.314621 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/decorators.py", line 30, in _tls_required [Wed Nov 19 05:29:01.314634 2014] [:warn] [pid 12837:tid 3020929856] return f(request, _args, _kwargs) [Wed Nov 19 05:29:01.314651 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/views.py", line 104, in wrapped_fn [Wed Nov 19 05:29:01.314666 2014] [:warn] [pid 12837:tid 3020929856] return f(request, _args, _kwargs) [Wed Nov 19 05:29:01.314684 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/views.py", line 930, in refresh [Wed Nov 19 05:29:01.314699 2014] [:warn] [pid 12837:tid 3020929856] glue.list_received_resources(request.META['wsgi.errors'], conf) [Wed Nov 19 05:29:01.314713 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 399, in inner [Wed Nov 19 05:29:01.314725 2014] [:warn] [pid 12837:tid 3020929856] return func(_args, _kwargs) [Wed Nov 19 05:29:01.314738 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/gui/app/glue.py", line 93, in list_received_resources [Wed Nov 19 05:29:01.314753 2014] [:warn] [pid 12837:tid 3020929856] pdus = z.call_rpkid(list_received_resources_elt.make_pdu(self_handle=conf.handle)) [Wed Nov 19 05:29:01.314770 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/irdb/zookeeper.py", line 1071, in call_rpkid [Wed Nov 19 05:29:01.314783 2014] [:warn] [pid 12837:tid 3020929856] return call_rpkid(pdus) [Wed Nov 19 05:29:01.314799 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/async.py", line 377, in call [Wed Nov 19 05:29:01.314813 2014] [:warn] [pid 12837:tid 3020929856] event_loop() [Wed Nov 19 05:29:01.314828 2014] [:warn] [pid 12837:tid 3020929856] File "/usr/lib/python2.7/dist-packages/rpki/async.py", line 290, in event_loop [Wed Nov 19 05:29:01.314843 2014] [:warn] [pid 12837:tid 3020929856] old = signal.signal(sig, _raiseExitNow) }}}
but I don't see any obvious sign that this is causing the problem reported here.
Trac comment by melkins on 2014-11-19T18:28:22Z
In [changeset:"6026"]: {{{
enable logging rpki.async to the apache log file. see #681 }}}
Trac comment by melkins on 2014-11-19T18:36:53Z
The signal.signal()
calls are setting up signal handlers for SIGTERM
etc so that signals sent to this process will cause a clean exit from the asynchronous I/O system. In theory nothing bad should happen if they aren't set and nobody sends those signals.
Looking at the context in which this is happening (rpki.async.event_loop()
), it looks like the exception you're seeing while trying to set the signal handler will cause event_loop()
to exit without ever running the event loop itself, which may be the real problem.
Things this theory does not explain:
logger.exception()
call towards the end of this function?If we think that mod_wsgi
really just does not want us to set signal handlers, it would be easy enough to add an option somewhere in the rpki.async.sync_wrapper()
class which passes an empty tuple of signal numbers to event_loop()
.
Does the mod_wsgi
doc have any comment on signal handlers?
Trac comment by sra on 2014-11-19T19:02:32Z
On 11/19/2014 11:02 AM, Trac Ticket System wrote:
The
signal.signal()
calls are setting up signal handlers forSIGTERM
etc so that signals sent to this process will cause a clean exit from the asynchronous I/O system. In theory nothing bad should happen if they aren't set and nobody sends those signals.Looking at the context in which this is happening (
rpki.async.event_loop()
), it looks like the exception you're seeing while trying to set the signal handler will causeevent_loop()
to exit without ever running the event loop itself, which may be the real problem.Things this theory does not explain:
- Why does this only happen sometimes rather than for all GUI instances?
- Why is there nothing in the log from the catch-all
logger.exception()
call towards the end of this function?
I've only selectively enabled logging in rpki.gui.default_settings.py. Just pushed a change to turn it on for rpki.async. Are there any other modules that should be enabled here as well?
https://trac.rpki.net/changeset/6026
If we think that
mod_wsgi
really just does not want us to set signal handlers, it would be easy enough to add an option somewhere in therpki.async.sync_wrapper()
class which passes an empty tuple of signal numbers toevent_loop()
.Does the
mod_wsgi
doc have any comment on signal handlers?
https://code.google.com/p/modwsgi/wiki/ApplicationIssues#Registration_Of_Signal_Handlers
Trac comment by melkins on 2014-11-19T19:18:27Z
In [changeset:"6027"]: {{{
Add disable_signal_handlers option to rpki.async.sync_wrapper and rpki.irdb.Zookeeper() to avoid problems when running in environments like mod_wsgi which have strong opinions about what code is and is not allowed to handle signals. This may result in some noise in log files, but there's not much we can do about that if we're not allowed to clean up when terminated, so be it. See #681. }}}
Trac comment by sra on 2014-11-19T19:52:55Z
Just committed [6027], which adds a disable_signal_handlers
option
to Zookeeper.__init__()
and sync_wrapper.__init__()
; the latter
does the real work, the former is just a pass-through, since nothing
but zookeeper methods ever use sync_wrapper
directly anymore.
The default behavior is not to disable signal handlers, that is, to behave as currently, because this is correct for everything but the GUI.
In theory, all you should need to do to use this in the GUI is set that option when creating your Zookeeper instance, after that it should be automatic.
Be warned that disabling these signal handlers may generate some noise
in the log file. I don't //think// it will do anything else bad,
since the purpose of these signal handlers is mostly just to translate
one kind of termination signal into another one. I don't recall
whether the asyncore
I/O loop will exit (cleanly or at all) without
these signal handlers, but since in the GUI case the entire Apache
environment is shutting down if you receive one of these, it probably
doesn't matter very much, it's not like anything is going to hang
about waiting for a clean shutdown.
Trac comment by sra on 2014-11-19T19:59:47Z
I've only selectively enabled logging in rpki.gui.default_settings.py. Just pushed a change to turn it on for rpki.async. Are there any other modules that should be enabled here as well?
The code kind of assumes that anything logged at severity warning
or
worse will be logged, regardless of module. Certainly anything
logging at severity error
should be logged.
See rpki.log
for how logging is configured these days in all the
non-GUI code.
Trac comment by sra on 2014-11-19T20:13:46Z
In [changeset:"6029"]: {{{
configure root logger to route WARNING or above to the appropriate handlers. see #681 }}}
Trac comment by melkins on 2014-11-19T23:11:28Z
bottom line: o vmini is not where you can reach it now. will be monday o should i update now and try again?
Trac comment by randy on 2014-11-20T00:55:14Z
bottom line: o vmini is not where you can reach it now. will be monday o should i update now and try again?
Not quite yet.
I added a new initialization option to the Zookeeper class to humor mod_wsgi, but that option is still off by default, because we don't want it enabled for anything except the GUI. So Michael (or somebody) has to go through the GUI code enabling that option in the cases were we need it.
Once that's done, Michael (or somebody) has had time to test the result, and the build robot has time to build new packages, it will be time for you to upgrade.
Trac comment by sra on 2014-11-20T01:05:07Z
ok. fwiw, vmini will be shut down in about 24 hours and be down for three days.
Trac comment by randy on 2014-11-20T03:00:18Z
As of now, the GUI will now:
The new code won't be available in the .deb packages until after tonight's build.
Trac comment by melkins on 2014-11-20T18:08:50Z
The new code won't be available in the .deb packages until after tonight's build.
Er, there is no "tonight's build".
The build robot checks for updates every ten minutes. It does take a little while to run through all the builds, after which it uploads them all to the server at once. As I type this it's about halfway through building all the 0.6032 packages.
Trac comment by sra on 2014-11-20T18:34:57Z
vmini is back up in tokyo. may take ten mins for dns to propagate
Trac comment by randy on 2014-11-23T10:18:14Z
vmini is currently at rpki-ca 0.6030. It should be upgraded to the current (0.6035).
I'm trying to figure out how to appropriately test this bug. We only see it during workshops, so I'm pondering if there is some concurrency issue. My test script (~root/tk681.ssh) iterates over all the labuser accounts and doesn't seem to tickle the bug when done serially.
Trac comment by melkins on 2014-11-25T20:34:10Z
during workshop, labuser05 had no resources. if i tried to add, server error. so i deleted labuer05 but there seemed to be no way to create a user from workshop account
Trac ticket #681 component gui priority blocker, owner melkins, created by randy on 2014-02-20T09:33:56Z, last modified 2016-07-18T10:45:01Z