google-code-export / yabi

Automatically exported from code.google.com/p/yabi
0 stars 1 forks source link

quickstart twistd using all cpu #185

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Using next_release

What steps will reproduce the problem?
1. fab runtests

What is the expected output? What do you see instead?
The tests should pass.

Instead, at some unknown point twistd uses 100% of one core and the tests 
either cease to progress or progress very very slowly.

test_task_no_ready_tasks (yabiadmin.yabiengine.tests.TaskViewNoTasksTest) ... ok
test_task_no_tasktag (yabiadmin.yabiengine.tests.TaskViewNoTasksTest) ... ok

----------------------------------------------------------------------
Ran 11 tests in 1.052s

OK
[localhost] Executing task 'tests'
[localhost] local: nosetests -v

Done.
test_nothing (yabitests.a.other_test.ATestCase) ... ok
test_dd (yabitests.file_transfer_tests.FileUploadAndDownloadTest) ... ok
test_dd (yabitests.file_transfer_tests.FileUploadAndDownloadTestJustLCopy) ... 
ok
test_dd (yabitests.file_transfer_tests.FileUploadAndDownloadTestJustLink) ... ok
test_dd (yabitests.file_transfer_tests.FileUploadAndDownloadTestNoLinkAndLCopy) 
... ok
test_tar_on_a_few_files 
(yabitests.file_transfer_tests.FileUploadSmallFilesTest) ... ok
test_cksum_of_large_file (yabitests.file_transfer_tests.FileUploadTest) ... ok
test_success (yabitests.no_setup_tests.FirstTest) ... ok
test_run_yabish_no_args (yabitests.no_setup_tests.NotLoggedInTest) ... ok
test_successful_login (yabitests.no_setup_tests.NotLoggedInTest) ... ok
test_unsuccessful_login (yabitests.no_setup_tests.NotLoggedInTest) ... ok
test_hostname_not_setup (yabitests.no_setup_tests.ToolNotSetupTest) ... ok

Noticed twistd running full CPU after about 1 hr.

Strace of twistd process:

clock_gettime(CLOCK_MONOTONIC, {446597, 72252978}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 72832124}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 73323737}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 73819018}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 74400368}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 74936547}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 75512960}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 76073071}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 76631367}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 77170763}) = 0
epoll_wait(9, {}, 32, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {446597, 77700426}) = 0

Original issue reported on code.google.com by aahun...@gmail.com on 10 Apr 2012 at 6:26

GoogleCodeExporter commented 9 years ago
Test started at 1:16:44 PM

Last entry in be log at 1:42:46

2012-04-10 13:42:15+0800 [-] 127.0.0.1 - - [10/Apr/2012:13:42:15 +0800] "POST 
/fs/get?uri=localfs%3A//demo%40localhost/var/lib/jenkins/cksum%20%282012-04-10%2
013%3A41%3A01%29/2%20-%20cksum/STDOUT.txt HTTP/1.1" 200 103 "-" "-"
2012-04-10 13:42:16+0800 [-] 127.0.0.1 - - [10/Apr/2012:13:42:16 +0800] "POST 
/fs/get?uri=localfs%3A//demo%40localhost/var/lib/jenkins/cksum%20%282012-04-10%2
013%3A41%3A01%29/2%20-%20cksum/STDERR.txt HTTP/1.1" 200 0 "-" "-"
2012-04-10 13:42:45+0800 [-] starting task: 13
2012-04-10 13:42:45+0800 [-] 127.0.0.1 - - [10/Apr/2012:13:42:45 +0800] "GET 
/fs/mkdir?priority=100&yabiusername=demo&uri=localfs%3A%2F%2Fdemo%40localhost%2F
var%2Flib%2Fjenkins%2F35ce8780-8d01-44f0-8f50-e4a8bea52fc4%2Foutput%2F 
HTTP/1.0" 200 3 "-" "YabiStackless/0.1"
2012-04-10 13:42:46+0800 [-] Exploding Connector: command hostname, remoteurl 
/engine/remote_info/13, delay_set [(0, 'Unsubmitted'), (0, 'Pending'), (0, 
'Running'), (0, 'Error')]
2012-04-10 13:42:46+0800 [-] Exploding Connector: remoteurl 
/engine/remote_info/13, message Unsubmitted
2012-04-10 13:42:46+0800 [-] Exploding Connector: remoteurl 
/engine/remote_info/13, message Pending
2012-04-10 13:42:46+0800 [-] Exploding Connector: remoteurl 
/engine/remote_info/13, message Running
2012-04-10 13:42:46+0800 [-] Exploding Connector: remoteurl 
/engine/remote_info/13, message Error
2012-04-10 13:42:46+0800 [-] 127.0.0.1 - - [10/Apr/2012:13:42:46 +0800] "POST 
/exec/run HTTP/1.0" 200 38 "-" "YabiStackless/0.1"

Original comment by aahun...@gmail.com on 10 Apr 2012 at 6:30

GoogleCodeExporter commented 9 years ago
Test still 'running' at 2:31

Original comment by aahun...@gmail.com on 10 Apr 2012 at 6:31

GoogleCodeExporter commented 9 years ago
Looks like it has got stuck running hostname on exploding backend?

[ahunter@nowhere yabi_quickstart_tests]$ ps -Af | grep jenkins | grep yabi
jenkins  15547 15448  0 13:16 ?        00:00:00 
/var/lib/jenkins/yabi_quickstart_tests/virt_yabi_quickstart_tests/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/virt_yabi_quickstart_tests/bin/fab 
runtests
jenkins  15624     1  0 13:16 ?        00:00:01 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/gunicorn_dj
ango -w 5 -b 127.0.0.1:8000 -t 300
jenkins  15634 15624  0 13:16 ?        00:00:05 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/gunicorn_dj
ango -w 5 -b 127.0.0.1:8000 -t 300
jenkins  15638 15624  0 13:16 ?        00:00:20 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/gunicorn_dj
ango -w 5 -b 127.0.0.1:8000 -t 300
jenkins  15640     1  0 13:16 ?        00:00:00 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/py
thon 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/gu
nicorn_django -w 5 -b 127.0.0.1:8001 -t 300
jenkins  15641 15624  2 13:16 ?        00:01:50 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/gunicorn_dj
ango -w 5 -b 127.0.0.1:8000 -t 300
jenkins  15642 15624  1 13:16 ?        00:00:49 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/gunicorn_dj
ango -w 5 -b 127.0.0.1:8000 -t 300
jenkins  15643 15624  1 13:16 ?        00:00:48 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabife/yabife/virt_yabife/bin/gunicorn_dj
ango -w 5 -b 127.0.0.1:8000 -t 300
jenkins  15652 15640  1 13:16 ?        00:00:52 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/py
thon 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/gu
nicorn_django -w 5 -b 127.0.0.1:8001 -t 300
jenkins  15653 15640  1 13:16 ?        00:00:47 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/py
thon 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/gu
nicorn_django -w 5 -b 127.0.0.1:8001 -t 300
jenkins  15655 15640  0 13:16 ?        00:00:44 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/py
thon 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/gu
nicorn_django -w 5 -b 127.0.0.1:8001 -t 300
jenkins  15656 15640  1 13:16 ?        00:00:45 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/py
thon 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/gu
nicorn_django -w 5 -b 127.0.0.1:8001 -t 300
jenkins  15658 15640  1 13:16 ?        00:00:47 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/py
thon 
/var/lib/jenkins/yabi_quickstart_tests/yabiadmin/yabiadmin/virt_yabiadmin/bin/gu
nicorn_django -w 5 -b 127.0.0.1:8001 -t 300
jenkins  15730     1 74 13:16 ?        00:56:31 
/var/lib/jenkins/yabi_quickstart_tests/yabibe/yabibe/virt_yabibe/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabibe/yabibe/virt_yabibe/bin/twistd 
-noy server.py --logfile=-
jenkins  15759 15547  0 13:16 ?        00:00:00 /bin/sh -c cd yabitests && . 
virt_yabitests/bin/activate && nosetests -v
jenkins  15764 15759  0 13:16 ?        00:00:32 
/var/lib/jenkins/yabi_quickstart_tests/yabitests/virt_yabitests/bin/python 
/var/lib/jenkins/yabi_quickstart_tests/yabitests/virt_yabitests/bin/nosetests -v
jenkins  16648 15764  0 13:42 ?        00:00:00 /bin/sh -c . 
../yabish/virt_yabish/bin/activate && ../yabish/yabish 
--yabi-url="http://localhost:8000/" hostname
jenkins  16653 16648  0 13:42 ?        00:00:04 python ../yabish/yabish 
--yabi-url=http://localhost:8000/ hostname

Original comment by aahun...@gmail.com on 10 Apr 2012 at 6:33

GoogleCodeExporter commented 9 years ago
fe, admin and celery running ok

Original comment by aahun...@gmail.com on 10 Apr 2012 at 6:35

GoogleCodeExporter commented 9 years ago
Logged into admin using lynx.

It is exploding backend. Tool is hostname. Admin says it is still running.

Original comment by aahun...@gmail.com on 10 Apr 2012 at 6:38

GoogleCodeExporter commented 9 years ago
Hopefully fixed in d43b2f6b6cd0 on next_release branch.

The problem was that the Task execution was in a loop while the status stays in 
one of:

Not set, pending, unsubmitted, running

The task update by the running BE is set in tasklets so you could have one 
tasklet setting the status to done but then another one setting it back to 
unsubmitted. This would prevent the loop to exit ever.

Original comment by szab...@gmail.com on 13 Apr 2012 at 5:42

GoogleCodeExporter commented 9 years ago
Still happening.

From Hudson tests on faramir, non Epic. Notes from Andrew:

1 x Yabi_Selected_Test failure

Cause: bug which causes twistd to use 100% cpu and do nothing
21352 yabi      25   0  243m  56m 3236 R 99.9  1.4   3796:24 twistd 

Original comment by aahun...@gmail.com on 14 Jun 2012 at 9:12

GoogleCodeExporter commented 9 years ago

Original comment by aahun...@gmail.com on 7 Jul 2012 at 3:56

GoogleCodeExporter commented 9 years ago
fixed in cwellington-3 rev 01fe299d73f2

Original comment by retrogra...@gmail.com on 3 Aug 2012 at 7:48