edgurgel / verk

A job processing system that just verks! 🧛‍
https://hex.pm/packages/verk
MIT License
721 stars 65 forks source link

Strange behaviour with integer argument, potential bug #165

Closed tlvenn closed 6 years ago

tlvenn commented 6 years ago

Hi,

It seems like integer are not properly decoded for some reason. For example the following:

 Verk.enqueue(%Verk.Job{
      queue: :algolia_indexer,
      class: "Jumpn.AlgoliaIndexUpdater",
      args: [%{"activity_id" => 393087288629462191}],
      max_retry_count: 2
    })

is failing with:

[error] GenServer #PID<0.874.0> terminating
** (stop) :failed
Last message: {:"$gen_cast", {:perform, %Verk.Job{args: [%{"activity_id" => 3.9308728862946e17}], class: "MyWorker", enqueued_at: 1530103850, error_backtrace: "    :erlang.binary_to_integer(393087288629462191)\n    (backend) lib/jumpn/activity/algolia_index_updater.ex:20: Jumpn.AlgoliaIndexUpdater.perform/1\n    (verk) lib/verk/worker.ex:35: Verk.Worker.handle_cast/2\n    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4\n    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6\n", error_message: "argument error", failed_at: 1530103850, finished_at: nil, jid: "4749953972661461251", max_retry_count: 2, original_json: "{\"error_message\":\"argument error\",\"max_retry_count\":2,\"jid\":\"4749953972661461251\",\"class\":\"Jumpn.AlgoliaIndexUpdater\",\"retried_at\":null,\"queue\":\"algolia_indexer\",\"finished_at\":null,\"enqueued_at\":1530103850,\"error_backtrace\":\"    :erlang.binary_to_integer(393087288629462191)\\n    (backend) lib\\/jumpn\\/activity\\/algolia_index_updater.ex:20: Jumpn.AlgoliaIndexUpdater.perform\\/1\\n    (verk) lib\\/verk\\/worker.ex:35: Verk.Worker.handle_cast\\/2\\n    (stdlib) gen_server.erl:637: :gen_server.try_dispatch\\/4\\n    (stdlib) gen_server.erl:711: :gen_server.handle_msg\\/6\\n\",\"args\":[{\"activity_id\":3.9308728862946e+17}],\"failed_at\":1530103850,\"retry_count\":1}", queue: "algolia_indexer", retried_at: nil, retry_count: 1}, #PID<0.756.0>}}
State: nil

2 things to notice from the errror and stacktrace:

Any idea what is going on ?

tlvenn commented 6 years ago

@edgurgel any though ? Thanks in advance !

krasio commented 6 years ago

cjson strikes again?

127.0.0.1:6379> EVAL 'return cjson.encode(cjson.decode(\'{"args":[{"activity_id":393087288629462191}]}\'))' 0 
"{\"args\":[{\"activity_id\":3.9308728862946e+17}]}"
krasio commented 6 years ago

Comparison with Poison:

iex(13)> Poison.decode!(Poison.encode!(%{"activity_id" => 393087288629462191}))
%{"activity_id" => 393087288629462191}
edgurgel commented 6 years ago

-.- @krasio haha cjson is basically the reason of all problems :)

edgurgel commented 6 years ago

Yeah it's because this is not an "integer" anymore. It's a "Big Number" that cjson won't support.

For numbers that are larger than 32 bits you will need to use a string I think and then convert

edgurgel commented 6 years ago

More here: https://github.com/DaveGamble/cJSON/issues/162

I'm not sure if we can do much apart from documenting that we should avoid numbers greater than 2^32

tlvenn commented 6 years ago

Ha thanks for pointing out cJson, I did not realise that the LUA scripts needed to decode the job to move it around on redis side.

The thing is the LUA side does not care at all about the job args so it seems we only need to wrap it in a way that it will be treated as a string as far as cJson is concerned but on elixir side when we decode the job, we could unwrap it.

Simply doing a 2 pass Json encoding/decoding would do the trick.

What do you think ?

edgurgel commented 6 years ago

Ok so the reason why we need to encode again is for this:

https://github.com/edgurgel/verk/blob/master/priv/enqueue_retriable_job.lua#L7-L12

We had enqueued_at to the JSON.

A really really nasty solution?

Add "manually" to the original json the key "enqueued_at" : ...

Then we won't suffer of cjson encoding rules?

tlvenn commented 6 years ago

Ya i saw that and it's fine, my point is as far as Redis / LUA is concerned, the args can be opaque, it never needs them. So we can protect them from cJson by simply encoding them as a json string using Poison meaning that in Redis, the args is a Json encoded string.

We simply need to have a custom decoder / encoder with Poison. Might be a good opportunity to switch to Jason as well.

edgurgel commented 6 years ago

That's the problem. It needs them.

It needs to know which queue will be moved to and it needs to populate the enqueued_at. We can't remove from the sorted set before we add to the queue so we don't have jobs out of Redis at any point in time.

edgurgel commented 6 years ago

Or do you mean double encoding just args?

edgurgel commented 6 years ago

Now I understood your point! :)

edgurgel commented 6 years ago

Yeah I'm completely fine doing this approach + using Jason if it makes sense 👍

tlvenn commented 6 years ago

Ok great, let me submit a PR then

edgurgel commented 6 years ago

The only downside that I can think of is that it won't have the property of accepting "Sidekiq/Resque" clients sending jobs to Verk. They would need to double encode args before enqueuing the job 🤔

tlvenn commented 6 years ago

Hmm I did not think about that.. We should be able to detect this easily enough, the args should never be a string to begin with so I can unwrap when I detect one (a job being enqueued by Verk) or otherwise bypass this step (a job enqueued by Sidekiq/Resque).

tlvenn commented 6 years ago

I have pushed the commit to take care of that and fixed the 2 credo issues.