coryb / gearbox

Framework for asynchronous job dispatching
BSD 3-Clause "New" or "Revised" License
10 stars 6 forks source link

Not enumerating global handlers causes write requests to hang #5

Open jaybuff opened 11 years ago

jaybuff commented 11 years ago

If you enable unknown jobs with this config setting: "allow_unknown_jobs": 1,

And do not specify "do_create_global_status_v1" as a handler gearbox will not be able to create status objects and write requests will hang.

Perhaps the known_job_name method in JobManager should require a job with the given name has been registered rather than just returning true if allow_unknown_jobs is true.

coryb commented 11 years ago

In general I don't want to couple registration with known-jobs. For the case where a worker restarts or will-soon-be-started, then mod_gearbox would reject jobs (since they have not registered yet) even though we would be able to handle them within seconds.

For the specific case of allow_unknown_jobs, requiring a job to be registered makes sense, otherwise we should use the configure handlers.

At the moment I don't think we have a way to know if a job is registered though. mod_gearbox and peer workers do not know about which workers are registered (workers only know about jobs they themselves have registered). We would have to ask gearman for a list of all job names that have > 1 worker.

-Cory

jaybuff commented 11 years ago

We can use gearman_worker_function_exist to see if a worker is registered in gearman.

I agree we don't want mod_gearbox to reject jobs if the worker is temporarily offline. It would be handy to put warnings in the log when jobs are inserted that don't have a worker. During development or testing I often forgot to start a worker and then have to scratch my head for a while when I insert a job and everything hangs.

We could change known_job_name to do something like this:

function known_job_name($name)
    if $name in handlers list
        if not gearman_worker_function_exist( $name ):
            WARN "$name is listed in handers.conf, but there are no workers to process it"
        return true

    if allow_unknown_jobs
        if gearman_worker_function_exist( $name )
            return true
        else
            WARN "allow_unknown_jobs is true, but $name isn't registered."
            return false

known_job_name is probably a bad name for this function. Maybe "can_insert_job_with_name" or something.

The gearman specific calls should be abstracted away in JobManager to allow for alternative backends besides gearman like Amazon's SQS or AMQP.

We should consider the performance impacts of asking gearman if a worker exists every time we insert a job.

roman-verchikov commented 10 years ago

Jay, I've tried to reproduce the issue, but was unable to do so. Could you provide detailed steps to reproduce?..

It also seems that your solution would be somewhat misleading for the user: user says "Allow execution of all the jobs (even unknown)", but gets error response for the unregistered job.

jaybuff commented 10 years ago

To reproduce edit the GearboxConfigFile that your apache conf points to and set "allow_unknown_jobs": 1 Then make sure that there are no entries in handlers_file or `/etc/gearbox/*handlers.d/``` Then make a PUT/POST/DELETE request and it will hang. The mod_gearbox log file should say something to the effect of "no handlers found"

jaybuff commented 10 years ago

It also seems that your solution would be somewhat misleading for the user: user says "Allow execution of all the jobs (even unknown)", but gets error response for the unregistered job.

Yes, I agree. Maybe the config setting should be renamed. allow_unregistered_jobs perhaps?

roman-verchikov commented 10 years ago

Hrmmm... I still cannot reproduce the issue... The only way PUT/POST/DELETE curl request hangs for me is when workerGearbox is not started, otherwise I immediately get JSON response. It actually works for me regardless of allow_unknown_jobs being set or not.

$ cat /etc/apache2/mods-enabled/gearbox.conf 
<Location /test-basic>
    GearboxLogConfig  /home/vagrant/workspace/gearbox/apache/mod_gearbox/httpd-logger.conf
    GearboxConfigFile /etc/gearbox/test-basic.conf
    SetEnv TestBasic TestValue
    SetHandler gearbox-handler
</Location>
$

allow_unknown_jobs is set to 1:

$ cat /etc/gearbox/test-basic.conf
{
   "log": {
       "config_file": "/home/vagrant/workspace/gearbox/common/conf/stdout-logger.conf"
   },
   "gearman": {
       "host": "localhost",
       "port": 4730
   },
   "handlers_file": "/etc/gearbox/handlers.conf",
   "component" : "testbasic",
   "schemadir": "/etc/gearbox/schemas",
   "allow_unknown_jobs": 1,

   "status": {
       "plugin_path": "/home/vagrant/workspace/gearbox/plugins/status/sql/.libs",
       "persistence_type": "sql",
       "db_name": "/var/run/gearbox/db/status.db",
       "db_type" : "sqlite3"
   },

   "daemons" : [{
       "name" : "worker",
       "logname": "%{component}",
       "command" : "python /home/vagrant/workspace/gearbox/workers/test-basic/workerTestBasic.py /etc/gearbox/test-basic.conf",
       "count" : 1,
       "user" : "%{gearbox.user}"
   }]
}

handlers.conf is empty:

$ cat /etc/gearbox/handlers.conf
{
}

workerGearbox is started with /etc/gearbox/test-basic.conf:

workerGearbox --config /etc/gearbox/test-basic.conf

The curl requests I've made:

$ curl -s -X PUT -s http://localhost/test-basic/v1/thing/myname -d'{"id":"test", "stuff":"some stuff"}' | python -mjson.tool{
    "children": [], 
    "component": "testbasic", 
    "concurrency": 0, 
    "ctime": 1388673666, 
    "failures": 0, 
    "messages": [], 
    "mtime": 1388673666, 
    "operation": "create", 
    "progress": 0, 
    "state": "PENDING", 
    "status_uri": "http://localhost/test-basic/v1/status/s-75m4k572c7jn5sewfm64wfxk07", 
    "uri": "http://localhost/test-basic/v1/thing/myname"
}
$ curl -s -X POST http://localhost/test-basic/v1/thing -d'{"id":"test", "stuff":"some stuff"}' | python -mjson.tool
{
    "children": [], 
    "component": "testbasic", 
    "concurrency": 0, 
    "ctime": 1388673735, 
    "failures": 0, 
    "messages": [], 
    "mtime": 1388673735, 
    "operation": "create", 
    "progress": 0, 
    "state": "PENDING", 
    "status_uri": "http://localhost/test-basic/v1/status/s-27881qdb1vhmn3v4njb495ej2c", 
    "uri": "http://localhost/test-basic/v1/thing/t-f0c82sky2z8cdhxrzq6whz1jw1"
}

gearadmin status:

$ gearadmin --status
do_post_testbasic_thing_v1  1   0   0
do_put_testbasic_thing_v1   4   0   0
do_put_delay_job_v1 0   0   1
do_get_global_status_v1 0   0   1
do_create_global_status_v1  0   0   1
do_update_global_status_v1  0   0   1
do_decrement_global_counter_v1  0   0   1
do_post_global_status_v1    0   0   1
do_stop_global_status_v1    0   0   1
do_cancelwatch_global_status_v1 0   0   1
do_pollstate_global_status_v1   0   0   1
do_run_global_agents_v1 0   0   1
do_runlevel_global_agents_v1    0   0   1
.
jaybuff commented 10 years ago

Are you registering the handle via the handlers.d directory? What's this say:

$ grep -rI handler /var/log/gearbox/gearbox.log
roman-verchikov commented 10 years ago

Are you registering the handle via the handlers.d directory?

No

What's this say: $ grep -rI handler /var/log/gearbox/gearbox.log

I don't seem to have gearbox.log on the system:

vagrant@gearbox var $ sudo find / -name 'gearbox.log'
vagrant@gearbox var $

Have I messed my gearbox configuration again?..