Closed epelc closed 9 years ago
@epelc, thank you for reporting this bug and I'm sorry that this is causing you problems. I wanted to let you know that I saw this and I'll be working on helping you as soon as I can. Unfortunately because of work I won't be able to spend any real time on this until Thursday or Friday.
@epelc I am unable to reproduce this. There is a test for Destroy
that passes on my machine. If you are able to reproduce the bug consistently, would you mind telling me exactly how? If possible it would help tremendously to see some code.
To answer your questions:
Destroy
is supposed to remove all traces of the job from the database. It sounds like some bug is preventing Destroy
from finishing and that's why you're still seeing some fields. As an aside, it appears that the bug is occurring in the middle of a transaction. Redis is not ACID, so it is possible for a transaction to partially execute, leaving the database in an inconsistent state. If I understand correctly, the redis-check-aof tool will help remedy this and put the database back into a consistent state.
@albrow Just got back from lunch. I'll explain what I'm doing a bit more before I try to get it reproduced.
Heres some psuedo code for the job function
func updateTrackingJob(shipmentId bson.ObjectId) error {
// Lookup the shipment info in mongodb so we can get the job id
// When we register the job we save it's id to our shipment
s := findShipment(shipmentId)
// do some work
// if the work returns a certain case we need to destroy the job and remove it from our shipment
j,err := jobs.FindById(s.JobId)
if err != nil {
return err
}
err = j.Destroy()
if err != nil {
return err
}
//remove the job id from our shipment
return nil
}
@albrow I haven't been able to reliable reproduce it 100% but I think I found the cause.
The workers doJob
function doesn't handle jobs which destroy them self during execution.
https://github.com/albrow/jobs/blob/master/worker.go#L92
Note it always sets the time
and finished
fields. It also adds it back to the jobs:time
set which I think is causing it to run again without any info in the status
field which is causing the erorr in the destroy script.
hget
returns nil when the field doesn't exist which I think is evaluating as falsey in lua.
Return value Bulk string reply: the value associated with field, or nil when field is not present in the hash or key does not exist.
If you run this you can see that it does set the time
and finished
fields along with putting it in jobs:time
after you destroy the job in the handler.
https://gist.github.com/epelc/3512e1e40e6cfc3cefc3
Note use the -vjl
flag to see it find the id and destroy the jobs. Also it using db number 4 so SELECT 4
to get to it.
Let me know your thoughts are.
@epelc okay I see what's happening now. I didn't anticipate anyone destroying a job in a job hander while it was being executed. I have a fix in mind and will be creating a PR soon. When the PR is up, I would appreciate it if you could take a look and make sure it solves your problem.
I'll definitely test it out. Just let me know when it's ready. On Jul 4, 2015 4:50 PM, "Alex Browne" notifications@github.com wrote:
@epelc https://github.com/epelc okay I see what's happening now. I didn't anticipate anyone destroying a job in a job hander while it was being executed. I have a fix in mind, and will be creating a PR soon. When the PR is up, I would appreciate it if you could take a look and make sure it solves your problem.
— Reply to this email directly or view it on GitHub https://github.com/albrow/jobs/issues/26#issuecomment-118554950.
@epelc please check out #27. On that branch, I can run https://gist.github.com/epelc/3512e1e40e6cfc3cefc3 and get the expected results. That is, there are no remnants of the jobs if they are destroyed from inside a handler, and there are no errors or panics.
I've been seeing errors like the following lately. @albrow I was wondering if you could help me diagnose them.
ERR Error running script (call to f_7be0eed842fab7944ce1393b76bf8f46826c6656): @user_script:20: us er_script:20: attempt to concatenate local 'Status' (a boolean value)
I found the line it does this hereI'm trying to destroy the job from within my handler function once it reaches a certain state. Is it not safe to do this?
When I look up the job id I just get
finished
andtime
fields usinghgetall
.Is it supposed to leave the job in redis after you destroy it? From the
destroy_script.lua
docs it dosen't sound like it.