Closed b-long closed 8 years ago
It's unclear why the work done by server.py
is suddenly not reported back to staging.
On zin2 seem to indicate messages aren't received at all ( no log output from on_before_message
and the log entry "Message received"
was not written).
On moscato2 messages are received, and response is sent :
DEBUG: 12/21/2015 12:31:21 PM server.py - on_send() SEND {'content-length': 221, 'destination': '/topic/builderevents', 'persistent': 'true'} {"body": "Got build request...", "first_message": true, "job_id": "psygenet2r_20151221153124", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1329:psygenet2r_0.9.0.tar.gz"}.
DEBUG: 12/21/2015 12:31:21 PM transport.py - Sending frame ['SEND', '\n', 'content-length:221\n', 'destination:/topic/builderevents\n', 'persistent:true\n', '\n', '{"body": "Got build request...", "first_message": true, "job_id": "psygenet2r_20151221153124", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1329:psygenet2r_0.9.0.tar.gz"}', '\x00']
INFO: 12/21/2015 12:31:21 PM server.py - Reply sent
However , track_build_completion.log
on staging does not contain this message. Other instances of first_message
do exist in track_build_completion.log
:
biocadmin@ip-172-30-0-180:~/packagebuilder (changes-on-staging)$ grep first_message track_build_completion.log |tail -n 4
DEBUG: 12/18/2015 07:58:59 PM transport.py - Received frame: 'MESSAGE', headers={'content-length': '217', 'expires': '0', 'timestamp': '1450486995561', 'destination': '/topic/builderevents', 'persistent': 'true', 'priority': '4', 'message-id': 'ID:broker-46292-1448469945488-2:512:-1:1:7', 'subscription': 'e85cb19c7b93443b956386c3e501d5d6'}, body='{"body": "Got build request...", "first_message": true, "job_id": "isomiRs_20151218165858", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1108:isomiRs_0.99.11.tar.gz"}'
DEBUG: 12/19/2015 04:17:59 PM track_build_completion.py - on_before_message() {'content-length': '234', 'expires': '0', 'timestamp': '1450560136609', 'destination': '/topic/builderevents', 'persistent': 'true', 'priority': '4', 'message-id': 'ID:broker-46292-1448469945488-2:542:-1:1:1', 'subscription': 'e85cb19c7b93443b956386c3e501d5d6'} {"body": "Got build request...", "first_message": true, "job_id": "miRNAmeConverter_20151219131757", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1345:miRNAmeConverter_0.99.4.tar.gz"}.
INFO: 12/19/2015 04:17:59 PM track_build_completion.py - Received stomp message: {"body": "Got build request...", "first_message": true, "job_id": "miRNAmeConverter_20151219131757", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1345:miRNAmeConverter_0.99.4.tar.gz"}
DEBUG: 12/19/2015 04:17:59 PM transport.py - Received frame: 'MESSAGE', headers={'content-length': '234', 'expires': '0', 'timestamp': '1450560136609', 'destination': '/topic/builderevents', 'persistent': 'true', 'priority': '4', 'message-id': 'ID:broker-46292-1448469945488-2:542:-1:1:1', 'subscription': 'e85cb19c7b93443b956386c3e501d5d6'}, body='{"body": "Got build request...", "first_message": true, "job_id": "miRNAmeConverter_20151219131757", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1345:miRNAmeConverter_0.99.4.tar.gz"}'
The branch patch/31
is deployed to production. A new build of "psygenet2r" has run, and can be viewed here: http://staging.bioconductor.org:8000/job/553/ . One assumes succeeding builds will continue to work properly, but we should monitor the situation.
FYI @mtmorgan @jimhester @vobencha @priscian @hpages
Additional builds kicked off immediately afterwards seem to propagate the SPB without issue :
Started more investigation this morning, navigating to : http://staging.bioconductor.org:8000/job/556/ . Initially this gave a 404, indicating that 555 was the last build to execute. Next, I logged into staging and started a build using tracker ID 1343
and package https://tracker.bioconductor.org/file6757/cellTree_0.99.3.tar.gz
. Similar to the symptoms seen yesterday a job page is created at http://staging.bioconductor.org:8000/job/556/ , however output from the build nodes is never displayed.
Again, I check the server.log
file for both build nodes. The results seen on zin2 are similar to results as observed yesterday. However, results seen on moscato2 are now in-line with zin2 .
The last message to build a job is from 12/21:
pkgbuild@zin2:~/packagebuilder$ grep -F "server.py" server.log |grep -v "main()" | tail -n 11
INFO: 12/21/2015 04:55:11 PM server.py - Reply sent
DEBUG: 12/21/2015 04:56:20 PM server.py - on_before_message() {'content-length': '330', 'expires': '0', 'timestamp': '1450746043759', 'destination': '/topic/buildjobs', 'priority': '4', 'message-id': 'ID:broker-46292-1448469945488-2:564:-1:1:1', 'subscription': '93160f48732c485f8173febf386e494b'} {"force": true, "job_id": "globalSeq_20151221195622", "repository": "scratch", "bioc_version": "3.3", "svn_url": "https://tracker.bioconductor.org/file6770/globalSeq_0.99.4.tar.gz", "r_version": "3.3", "client_id": "single_package_builder_autobuild:1343:globalSeq_0.99.4.tar.gz", "time": "Mon Dec 21 2015 19:56:22 GMT-0800 (PST)"}.
INFO: 12/21/2015 04:56:20 PM server.py - on_message() Message received
INFO: 12/21/2015 04:56:20 PM server.py - on_message() Message acknowledged
INFO: 12/21/2015 04:56:20 PM server.py - on_message() job_dir: '/home/pkgbuild/packagebuilder/workers/jobs/globalSeq_20151221195622'.
INFO: 12/21/2015 04:56:20 PM server.py - on_message() Attempting to run commands from directory: '/home/pkgbuild/packagebuilder'
INFO: 12/21/2015 04:56:20 PM server.py - on_message() shell_cmd: '['python', '-m', 'workers.builder', u'/home/pkgbuild/packagebuilder/workers/jobs/globalSeq_20151221195622/manifest.json', u'3.3']'
INFO: 12/21/2015 04:56:20 PM server.py - on_message() jobfilename: '/home/pkgbuild/packagebuilder/workers/jobs/globalSeq_20151221195622/manifest.json'
INFO: 12/21/2015 04:56:20 PM server.py - on_message() builder_log: '<open file u'/home/pkgbuild/packagebuilder/workers/jobs/globalSeq_20151221195622/builder.log', mode 'w' at 0x7fda2464a540>'
DEBUG: 12/21/2015 04:56:20 PM server.py - on_send() SEND {'content-length': 216, 'destination': '/topic/builderevents', 'persistent': 'true'} {"body": "Got build request...", "first_message": true, "job_id": "globalSeq_20151221195622", "bioc_version": "3.3", "builder_id": "zin2", "client_id": "single_package_builder_autobuild:1343:globalSeq_0.99.4.tar.gz"}.
INFO: 12/21/2015 04:56:20 PM server.py - Reply sent
pkgbuild@zin2:~/packagebuilder$
Viewing the currently running Python processes shows a defunct process:
pkgbuild@zin2:~/packagebuilder$ ps ux |grep python |grep -v grep
pkgbuild 25101 0.0 0.0 137360 1912 ? Sl Dec21 0:00 python -m workers.server
pkgbuild 27739 0.0 0.0 0 0 ? Z Dec21 0:00 [python] <defunct>
The last message to build a job is from 12/21:
$ grep -F "server.py" server.log |grep -v "main()" | tail -n 11
INFO: 12/21/2015 04:55:11 PM server.py - Reply sent
DEBUG: 12/21/2015 04:56:20 PM server.py - on_before_message() {'content-length': '330', 'expires': '0', 'timestamp': '1450746043759', 'destination': '/topic/buildjobs', 'priority': '4', 'mes
sage-id': 'ID:broker-46292-1448469945488-2:564:-1:1:1', 'subscription': '440222e574334ac1a9256e48b2e2c80a'} {"force": true, "job_id": "globalSeq_20151221195622", "repository": "scratch", "bi
oc_version": "3.3", "svn_url": "https://tracker.bioconductor.org/file6770/globalSeq_0.99.4.tar.gz", "r_version": "3.3", "client_id": "single_package_builder_autobuild:1343:globalSeq_0.99.4.t
ar.gz", "time": "Mon Dec 21 2015 19:56:22 GMT-0800 (PST)"}.
INFO: 12/21/2015 04:56:20 PM server.py - on_message() Message received
INFO: 12/21/2015 04:56:20 PM server.py - on_message() Message acknowledged
INFO: 12/21/2015 04:56:20 PM server.py - on_message() job_dir: 'E:\packagebuilder\workers\jobs\globalSeq_20151221195622'.
INFO: 12/21/2015 04:56:20 PM server.py - on_message() Attempting to run commands from directory: 'E:\packagebuilder'
INFO: 12/21/2015 04:56:20 PM server.py - on_message() shell_cmd: '['python', '-m', 'workers.builder', u'E:\\packagebuilder\\workers\\jobs\\globalSeq_20151221195622\\manifest.json', u'3.3']'
INFO: 12/21/2015 04:56:20 PM server.py - on_message() jobfilename: 'E:\packagebuilder\workers\jobs\globalSeq_20151221195622\manifest.json'
INFO: 12/21/2015 04:56:20 PM server.py - on_message() builder_log: '<open file u'E:\\packagebuilder\\workers\\jobs\\globalSeq_20151221195622\\builder.log', mode 'w' at 0x018FEB20>'
DEBUG: 12/21/2015 04:56:20 PM server.py - on_send() SEND {'content-length': 220, 'destination': '/topic/builderevents', 'persistent': 'true'} {"body": "Got build request...", "first_message"
: true, "job_id": "globalSeq_20151221195622", "bioc_version": "3.3", "builder_id": "moscato2", "client_id": "single_package_builder_autobuild:1343:globalSeq_0.99.4.tar.gz"}.
INFO: 12/21/2015 04:56:20 PM server.py - Reply sent
The build server is still running, and shows 3 threads:
After the recent deployments, it seems that the SPB is continuing to drop communication capability at some point.
It would appear that the builds actually complete (although not successfully) :
pkgbuild@zin2:~/packagebuilder$ grep -n build_package workers/jobs/JunctionSeq_20151222131237/builder.log workers/jobs/JunctionSeq_20151222133405/builder.log workers/jobs/IHW_20151222112042/builder.log
workers/jobs/JunctionSeq_20151222131237/builder.log:48:INFO: 12/22/2015 10:13:35 AM builder.py - build_package() finished with result 1
workers/jobs/JunctionSeq_20151222133405/builder.log:48:INFO: 12/22/2015 10:35:06 AM builder.py - build_package() finished with result 1
workers/jobs/IHW_20151222112042/builder.log:50:INFO: 12/22/2015 11:27:54 AM builder.py - build_package() finished with result 0
The same thing appears to be happening:
user@moscato2 /cygdrive/e/packagebuilder
$ grep -n build_package workers/jobs/JunctionSeq_20151222131237/builder.log workers/jobs/JunctionSeq_20151222133405/builder.log workers/jobs/IHW_20151222112042/builder.log
workers/jobs/JunctionSeq_20151222131237/builder.log:49:INFO: 12/22/2015 10:15:06 AM builder.py - build_package() finished with result 1
workers/jobs/JunctionSeq_20151222133405/builder.log:49:INFO: 12/22/2015 10:36:42 AM builder.py - build_package() finished with result 1
workers/jobs/IHW_20151222112042/builder.log:50:INFO: 12/22/2015 11:25:26 AM builder.py - build_package() finished with result 0
In both cases, server.py
is still running. There are no defunct processes, and new builds continue to be received and launched :
pkgbuild@zin2:~/packagebuilder$ grep JunctionSeq_2015122212050 server.log -B 3 -A 3
DEBUG: 12/22/2015 11:53:45 AM server.py - main() Waiting to do work.
DEBUG: 12/22/2015 11:58:45 AM server.py - main() Waiting to do work.
DEBUG: 12/22/2015 12:03:45 PM server.py - main() Waiting to do work.
DEBUG: 12/22/2015 12:05:02 PM server.py - on_before_message() {'expires': '0', 'timestamp': '1450814968390', 'destination': '/topic/buildjobs', 'persistent': 'true', 'priority': '4', 'x-client': '140.107.89.113', 'message-id': 'ID:broker-46292-1448469945488-2:614:-1:1:1', 'subscription': '892b7145a09c4b36b3b802addbc7b22d'} {"force": true, "job_id": "JunctionSeq_20151222120502", "repository": "scratch", "bioc_version": "3.3", "svn_url": "https://tracker.bioconductor.org/file6784/JunctionSeq_0.99.3.tar.gz", "r_version": "3.3", "client_id": "single_package_builder_autobuild:1341:JunctionSeq_0.99.3.tar.gz", "time": "Tue Dec 22 2015 12:05:02 GMT-0700 (PST)"}.
INFO: 12/22/2015 12:05:02 PM server.py - on_message() Message received
INFO: 12/22/2015 12:05:02 PM server.py - on_message() Message acknowledged
INFO: 12/22/2015 12:05:02 PM server.py - on_message() job_dir: '/home/pkgbuild/packagebuilder/workers/jobs/JunctionSeq_20151222120502'.
INFO: 12/22/2015 12:05:02 PM server.py - on_message() Attempting to run commands from directory: '/home/pkgbuild/packagebuilder'
INFO: 12/22/2015 12:05:02 PM server.py - on_message() shell_cmd: '['python', '-m', 'workers.builder', u'/home/pkgbuild/packagebuilder/workers/jobs/JunctionSeq_20151222120502/manifest.json', u'3.3']'
INFO: 12/22/2015 12:05:02 PM server.py - on_message() jobfilename: '/home/pkgbuild/packagebuilder/workers/jobs/JunctionSeq_20151222120502/manifest.json'
INFO: 12/22/2015 12:05:02 PM server.py - on_message() builder_log: '<open file u'/home/pkgbuild/packagebuilder/workers/jobs/JunctionSeq_20151222120502/builder.log', mode 'w' at 0x7f4663cce540>'
DEBUG: 12/22/2015 12:05:02 PM server.py - on_send() SEND {'content-length': 220, 'destination': '/topic/builderevents', 'persistent': 'true'} {"body": "Got build request...", "first_message": true, "job_id": "JunctionSeq_20151222120502", "bioc_version": "3.3", "builder_id": "zin2", "client_id": "single_package_builder_autobuild:1341:JunctionSeq_0.99.3.tar.gz"}.
DEBUG: 12/22/2015 12:05:02 PM transport.py - Sending frame ['SEND', '\n', 'content-length:220\n', 'destination:/topic/builderevents\n', 'persistent:true\n', '\n', '{"body": "Got build request...", "first_message": true, "job_id": "JunctionSeq_20151222120502", "bioc_version": "3.3", "builder_id": "zin2", "client_id": "single_package_builder_autobuild:1341:JunctionSeq_0.99.3.tar.gz"}', '\x00']
INFO: 12/22/2015 12:05:02 PM server.py - on_message() Reply sent
DEBUG: 12/22/2015 12:08:45 PM server.py - main() Waiting to do work.
INFO: 12/22/2015 12:12:53 PM server.py - on_message() blderProcess finished with status 0.
INFO: 12/22/2015 12:12:53 PM server.py - on_message() builder_log closed.
DEBUG: 12/22/2015 12:12:53 PM transport.py - Received frame: 'MESSAGE', headers={'expires': '0', 'timestamp': '1450814968390', 'destination': '/topic/buildjobs', 'persistent': 'true', 'priority': '4', 'x-client': '140.107.89.113', 'message-id': 'ID:broker-46292-1448469945488-2:614:-1:1:1', 'subscription': '892b7145a09c4b36b3b802addbc7b22d'}, body='{"force": true, "job_id": "JunctionSeq_20151222120502", "repository": "scratch", "bioc_version": "3.3", "svn_url": "https://tracker.bioconductor.org/file6784/JunctionSeq_0.99.3.tar.gz", "r_version": "3.3", "client_id": "single_package_builder_autobuild:1341:JunctionSeq_0.99.3.tar.gz", "time": "Tue Dec 22 2015 12:05:02 GMT-0700 (PST)"}'
DEBUG: 12/22/2015 12:13:45 PM server.py - main() Waiting to do work.
DEBUG: 12/22/2015 12:18:45 PM server.py - main() Waiting to do work.
Perhaps the issue is just in sending information back, I'm going to start looking at that code now.
Currently, both zin2 and moscato2 send a message like :
on_send SEND {'content-length': 316, 'destination': '/topic/builderevents', 'persistent': 'true'} {"body": "builder.py exited", "status": "autoexit", "job_id": "spbtest_20151223120932", "builder_id": "zin2", "svn_url": "https://tracker.bioconductor.org/file6786/spbtest_0.99.3.tar.gz", "client_id": "single_package_builder_autobuild:1356:spbtest_0.99.3.tar.gz", "time": "2015-12-23 09:10:23.293080", "retcode": -1}
The message sent from zin2 is received while the message sent from moscato2 is not.
Hmm, my commit did not close the issue, but that's ok, because I wanted to expand on what I did a little bit (for @mtmorgan and @b-long ).
Not sure how it evolved. but builder.py uses an exit handler (https://docs.python.org/2/library/atexit.html) which is sort of like a finalizer or on.exit()
in R, which allows you to register a 'cleanup' function that will run whenever the interpreter exits.
In builder.py, the sending of the final 'everything is done' message (the one that track_build_completion.py is waiting for before it decides that a build is done on a node) is done in this cleanup function (called onexit()
).
I added a sleep at the end of the function. Previously, the call to send_message()
was the last line in the function, so it was called right before exiting the interpreter. My theory is that this happened too fast somehow on windows and send_message() didn't get to complete.
Does not seem to be a issue on linux. Hard to tell if this is a real issue that should be filed somewhere. Not sure if it is a problem in python, windows, or stomp. Anyway, now those final messages are coming through on windows and builds are (so far) completing.
I've just used rerun_build.py
to execute the following tests, and all have passed:
After that, @priscian used rerun_build.py
to execute a build of https://tracker.bioconductor.org/issue1351 . Unfortunately, this build did not succeed . As you can see on the spb_history page (http://staging.bioconductor.org:8000/job/615/), output is missing for large portion of the zin2 build.
I've found an error in the builder.log file on zin2, which I believe is related :
on_send SEND {'content-length': 216, 'destination': '/topic/builderevents', 'persistent': 'true'} {"status": "starting_check", "body": "", "job_id": "DNAshapeR_20151230095500", "client_id": "s
ingle_package_builder_autobuild:1351:DNAshapeR_0.99.0.tar.gz", "time": "2015-12-30 06:55:50.903123", "builder_id": "zin2"}
on_send SEND {'content-length': 399, 'destination': '/topic/builderevenINFO: 12/30/2015 06:55:50 AM transport.py - Sending frame cmd='SEND' headers={'content-length': 399, 'destination': '/top
ic/builderevents', 'persistent': 'true'}
INFO: 12/30/2015 06:55:50 AM builder.py - send_message(): Message sent.
INFO: 12/30/2015 06:55:50 AM builder.py - Attempting to tail file Rcheck.out
INFO: 12/30/2015 06:55:51 AM builder.py - Attempting to send message: '{'status': 'checking', 'body': '* using log directory \xe2\x80\x98/home/pkgbuild/packagebuilder/workers/jobs/DNAshapeR_20
151230095500/DNAshapeR.Rcheck\xe2\x80\x99\n* using R Under development (unstable) (2015-12-14 r69775)\n* using platform: x86_64-pc-linux-gnu (64-bit)\n* using session charset: UTF-8\n* using o
ption \xe2\x80\x98--no-vignettes\xe2\x80\x99\n* checking for file \xe2\x80\x98DNAshapeR/DESCRIPTION\xe2\x80\x99 ... OK\n* checking extension type ... Package\n* this is package \xe2\x80\x98DNA
shapeR\xe2\x80\x99 version \xe2\x80\x980.99.0\xe2\x80\x99\n* checking package namespace information ... OK\n* checking package dependencies ...', 'sequence': 1}'
INFO: 12/30/2015 06:55:51 AM builder.py - msg is dict
INFO: 12/30/2015 06:55:51 AM builder.py - merged_dict: '{'status': 'checking', 'body': '* using log directory \xe2\x80\x98/home/pkgbuild/packagebuilder/workers/jobs/DNAshapeR_20151230095500/DN
AshapeR.Rcheck\xe2\x80\x99\n* using R Under development (unstable) (2015-12-14 r69775)\n* using platform: x86_64-pc-linux-gnu (64-bit)\n* using session charset: UTF-8\n* using option \xe2\x80\
x98--no-vignettes\xe2\x80\x99\n* checking for file \xe2\x80\x98DNAshapeR/DESCRIPTION\xe2\x80\x99 ... OK\n* checking extension type ... Package\n* this is package \xe2\x80\x98DNAshapeR\xe2\x80\
x99 version \xe2\x80\x980.99.0\xe2\x80\x99\n* checking package namespace information ... OK\n* checking package dependencies ...', 'job_id': u'DNAshapeR_20151230095500', 'client_id': u'single_
package_builder_autobuild:1351:DNAshapeR_0.99.0.tar.gz', 'time': '2015-12-30 06:55:51.305495', 'sequence': 1, 'builder_id': 'zin2'}'
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/home/pkgbuild/packagebuilder/workers/builder.py", line 110, in run
"body": bytes
File "/home/pkgbuild/packagebuilder/workers/builder.py", line 142, in send_message
logging.info("Final modified body: '{body}'".format(body=body))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-24: ordinal not in range(128)
INFO: 12/30/2015 06:58:22 AM builder.py - Attempting to send message: '{'status': 'check_complete', 'body': 'Check completed with status 1', 'elapsed_time': '0:02:31.834042', 'result_code': 1,
'warnings': True}'
INFO: 12/30/2015 06:58:22 AM builder.py - msg is dict
INFO: 12/30/2015 06:58:22 AM builder.py - merged_dict: '{'status': 'check_complete', 'body': 'Check completed with status 1', 'job_id': u'DNAshapeR_20151230095500', 'warnings': True, 'builder_
id': 'zin2', 'elapsed_time': '0:02:31.834042', 'client_id': u'single_package_builder_autobuild:1351:DNAshapeR_0.99.0.tar.gz', 'time': '2015-12-30 06:58:22.738753', 'result_code': 1}'
INFO: 12/30/2015 06:58:22 AM builder.py - Final modified body: 'Check completed with status 1'
INFO: 12/30/2015 06:58:22 AM builder.py - Ascii encoded body: 'Check completed with status 1'
I'm investigating this now.
Brian think you just need to make your format strings unicode.
@jimhester I think that's right, going to make that change and deploy for test.
Simple test case
"{}".format(u"\u0449")
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
u"{}".format(u"\u0449")
# u'\u0449'
Ok, that change fixed the Traceback mentioned earlier.
I've re-run 1351 and 1328 for Jim J., and both have successfully finished all stages of the SPB's work. I'm going to merge #33 and close this issue.
FYI, regarding the change I made earlier on this issue (adding a sleep to the atexit finalizer), I have a minimal reproducible example that does seem to indicate that this is a problem. It seems like it just started happening after converting builder.py to use the stomp (aka stomp.py) package (it used to use stompy). So I have reported this to the author of stomp:
https://github.com/jasonrbriggs/stomp.py/issues/70
The problem doesn't seem to be with the atexit stuff either. It happens anytime you exit the interpreter right after calling send()
, whether it's a normal exit (last statement in script) or facilitated by atexit.
@b-long @jimhester @mtmorgan @priscian
So, the bug above is resolved now. Turns out you need to disconnect from the stomp client if you are going to exit the interpreter right after sending a message:
stompClient.disconnect(receipt=None)
This obviates the need for sleeping after sending the mesage.
I've done this in 214b5371fad87570ed273068897c9f8fa4afc2ab .
The build nodes are not processing builds properly. For example, consider a recent build for @priscian . The package that needs to be built is "psygenet2r". I start the build using
rerun_build
, a part ofspb_history
(which lives here now).On the controller node (staging.bioconductor.org)
The build began normally, printing some information to the console :
A page was created to track the build. However, output from the build nodes is never displayed: http://staging.bioconductor.org:8000/job/552/ .
On zin2 (a build node)
Next, I began investigating the build machine. The service running on all build nodes is
server.py
, which prints it's output toserver.log
. This service regularly receives the command to start build. In the log file we see that the service is "Waiting to do work.":Interestingly, If I omit the
main()
loop in searching the log file, we see that the last time instruction was received to perform a build is days ago:I'm unsure what triggered this build, as I didn't kick it off. However, we see that it's page (in the Django application) is available and that a build completed for all nodes: http://staging.bioconductor.org:8000/job/545/ .
Looking at the running processes, we see that the server is still running. However, it's interesting that there's a defunct sub-process :
On moscato2 (another build node)
Dissimilarly, on moscato2, we see that the message to start the build is received :
There are no defunct processes (I'm not sure if this construct exists in Windows). Also,
server.py
is still running. The build seems to succeed, and we can see that by navigating to the directoryE:\packagebuilder\workers\jobs\psygenet2r_20151221153124
.A log file named
builder.log
contains the output below :Also, the file
E:\packagebuilder\workers\jobs\psygenet2r_2015122115312\psygenet2r_0.9.0.tar.gz
is created.I'm beginning investigation now. It may also be worth noting that the only sub-process started directly by
server.py
isbuilder.py
. However, many sub-processes are started bybuilder.py
.