contribsys / faktory

Language-agnostic persistent background job server
https://contribsys.com/faktory/
Other
5.73k stars 227 forks source link

Batches with success and complete callbacks #340

Closed pbrisbin closed 3 years ago

pbrisbin commented 3 years ago

Are you using an old version? No Have you checked the changelogs to see if your issue has been fixed in a later version? N/A


I'm working on adding BATCH support to our Haskell client for Faktory. I't's going smoothly, but I had a test failure, that I believe indicates a possible Faktory bug:

    it "runs success and complete if all Jobs were successful" $ do
      jobs <- workerTestCase $ \producer -> do
        -- Create two Jobs, "c" and "d"
        c <- buildJob @Text producer mempty "c"
        d <- buildJob @Text producer mempty "d"

        -- Prepare options for BATCH NEW, to run "c" as complete and "d" as success
        let options = description "foo" <> complete c <> success d

        -- Run two other jobs, "a" and "b" as a batch with these options
        void $ runBatchT options producer $ do
          void $ batchPerform @Text mempty producer "a"
          void $ batchPerform @Text mempty producer "b"

          -- Delay teardown a bit so things can happen
          liftIO $ threadDelay 1000000

      -- We should've seen all our jobs run (include the "stop" job).
      jobs `shouldMatchList` ["a", "b", "c", "d", "HALT"]

The failure is that "d" (the success callback) never runs.

Here is a log of communication with Faktory during this test:

[DEBUG]: < Right (Just "HI {\"v\":2}")
[DEBUG]: > "HELLO {\"wid\":null,\"hostname\":\"\\\"localhost\\\"\",\"pid\":1466441,\"labels\":[\"haskell\"],\"v\":2,\"pwdhash\":null}"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "FLUSH"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "END"
[DEBUG]: < Right (Just "HI {\"v\":2}")
[DEBUG]: > "HELLO {\"wid\":\"pydveios\",\"hostname\":\"\\\"localhost\\\"\",\"pid\":1466441,\"labels\":[\"haskell\"],\"v\":2,\"pwdhash\":null}"
[DEBUG]: < Right (Just "HI {\"v\":2}")
[DEBUG]: > "HELLO {\"wid\":null,\"hostname\":\"\\\"localhost\\\"\",\"pid\":1466441,\"labels\":[\"haskell\"],\"v\":2,\"pwdhash\":null}"
[DEBUG]: < Right (Just "OK")
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "BATCH NEW {\"description\":\"foo\",\"success\":{\"jid\":\"xmvmostlmtlx\",\"jobtype\":\"Default\",\"args\":[\"d\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null},\"complete\":{\"jid\":\"dtxsivqukpad\",\"jobtype\":\"Default\",\"args\":[\"c\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null}}"
[DEBUG]: > "FETCH default"
[DEBUG]: < Right (Just "b-AfcqWSn0j39czw")
[DEBUG]: > "PUSH {\"jid\":\"vwwyulgwsnwx\",\"jobtype\":\"Default\",\"args\":[\"a\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":{\"bid\":\"b-AfcqWSn0j39czw\"}}"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "PUSH {\"jid\":\"emgszsowirck\",\"jobtype\":\"Default\",\"args\":[\"b\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":{\"bid\":\"b-AfcqWSn0j39czw\"}}"
[DEBUG]: < Right (Just "{\"jid\":\"vwwyulgwsnwx\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"a\"],\"created_at\":\"2021-03-01T20:45:43.759726438Z\",\"enqueued_at\":\"2021-03-01T20:45:43.759772293Z\",\"retry\":25,\"custom\":{\"bid\":\"b-AfcqWSn0j39czw\"}}")
[DEBUG]: > "ACK {\"jid\":\"vwwyulgwsnwx\"}"
[DEBUG]: < Right (Just "OK")
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "FETCH default"
[DEBUG]: < Right (Just "{\"jid\":\"emgszsowirck\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"b\"],\"created_at\":\"2021-03-01T20:45:43.760421903Z\",\"enqueued_at\":\"2021-03-01T20:45:43.76046349Z\",\"retry\":25,\"custom\":{\"bid\":\"b-AfcqWSn0j39czw\"}}")
[DEBUG]: > "ACK {\"jid\":\"emgszsowirck\"}"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "FETCH default"
[DEBUG]: > "BATCH COMMIT b-AfcqWSn0j39czw"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "PUSH {\"jid\":\"fynipcufyrog\",\"jobtype\":\"Default\",\"args\":[\"HALT\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null}"
[DEBUG]: < Right (Just "{\"jid\":\"dtxsivqukpad\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"c\"],\"created_at\":\"2021-03-01T20:45:44.76439491Z\",\"enqueued_at\":\"2021-03-01T20:45:44.764398233Z\",\"retry\":0,\"custom\":{\"_bid\":\"b-AfcqWSn0j39czw\",\"_cb\":\"complete\"}}")
[DEBUG]: > "ACK {\"jid\":\"dtxsivqukpad\"}"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "END"
[DEBUG]: < Right (Just "OK")
[DEBUG]: > "FETCH default"
[DEBUG]: < Right (Just "{\"jid\":\"fynipcufyrog\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"HALT\"],\"created_at\":\"2021-03-01T20:45:44.765266688Z\",\"enqueued_at\":\"2021-03-01T20:45:44.765270194Z\",\"retry\":25}")
[DEBUG]: > "END

It confirms that we set up the BATCH NEW correctly (IMO) with success and complete:

BATCH NEW {\"description\":\"foo\",\"success\":{\"jid\":\"xmvmostlmtlx\",\"jobtype\":\"Default\",\"args\":[\"d\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null},\"complete\":{\"jid\":\"dtxsivqukpad\",\"jobtype\":\"Default\",\"args\":[\"c\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null}}

The docs say we can use complete "and/or" success. That implies the behavior I'm testing above to me: complete should always run, and success should run (also) if all Jobs were successful. In actuality, it seems that you can use complete or success, and if you pass complete and success, only complete will run.

The functionality isn't particularly important to me, but I would like to know if it's intended, mainly so I can keep hunting for a bug in my client if you're confident Faktory should run success and complete in this case.

mperham commented 3 years ago

1.4.0 isn’t the latest version. Does the change log have anything related? I can’t recall.

On Mon, Mar 1, 2021 at 13:06 patrick brisbin notifications@github.com wrote:

-

Which Faktory package and version?

Faktory Enterprise 1.4.0

Which Faktory worker package and version?

https://hackage.haskell.org/package/faktory

Please include any relevant worker configuration

N/A

Please include any relevant error messages or stacktraces

N/A

Are you using an old version? No Have you checked the changelogs to see if your issue has been fixed in a later version? N/A

I'm working on adding BATCH support to our Haskell client for Faktory. I't's going smoothly, but I had a test failure, that I believe indicates a possible Faktory bug:

it "runs success and complete if all Jobs were successful" $ do
  jobs <- workerTestCase $ \producer -> do
    -- Create two Jobs, "c" and "d"
    c <- buildJob @Text producer mempty "c"
    d <- buildJob @Text producer mempty "d"

    -- Prepare options for BATCH NEW, to run "c" as complete and "d" as success
    let options = description "foo" <> complete c <> success d

    -- Run two other jobs, "a" and "b" as a batch with these options
    void $ runBatchT options producer $ do
      void $ batchPerform @Text mempty producer "a"
      void $ batchPerform @Text mempty producer "b"

      -- Delay teardown a bit so things can happen
      liftIO $ threadDelay 1000000

  -- We should've seen all our jobs run (include the "stop" job).
  jobs `shouldMatchList` ["a", "b", "c", "d", "HALT"]

The failure is that "d" (the success callback) never runs.

Here is a log of communication with Faktory during this test:

DEBUG: < Right (Just "HI {\"v\":2}")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "HI {\"v\":2}")

DEBUG: < Right (Just "HI {\"v\":2}")

DEBUG: < Right (Just "OK") DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "b-AfcqWSn0j39czw")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "{\"jid\":\"vwwyulgwsnwx\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"a\"],\"created_at\":\"2021-03-01T20:45:43.759726438Z\",\"enqueued_at\":\"2021-03-01T20:45:43.759772293Z\",\"retry\":25,\"custom\":{\"bid\":\"b-AfcqWSn0j39czw\"}}")

DEBUG: < Right (Just "OK") DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "{\"jid\":\"emgszsowirck\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"b\"],\"created_at\":\"2021-03-01T20:45:43.760421903Z\",\"enqueued_at\":\"2021-03-01T20:45:43.76046349Z\",\"retry\":25,\"custom\":{\"bid\":\"b-AfcqWSn0j39czw\"}}")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "{\"jid\":\"dtxsivqukpad\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"c\"],\"created_at\":\"2021-03-01T20:45:44.76439491Z\",\"enqueued_at\":\"2021-03-01T20:45:44.764398233Z\",\"retry\":0,\"custom\":{\"_bid\":\"b-AfcqWSn0j39czw\",\"_cb\":\"complete\"}}")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "OK")

DEBUG: < Right (Just "{\"jid\":\"fynipcufyrog\",\"queue\":\"default\",\"jobtype\":\"Default\",\"args\":[\"HALT\"],\"created_at\":\"2021-03-01T20:45:44.765266688Z\",\"enqueued_at\":\"2021-03-01T20:45:44.765270194Z\",\"retry\":25}") DEBUG: > "END

It confirms that we set up the BATCH NEW correctly (IMO) with success and complete:

BATCH NEW {\"description\":\"foo\",\"success\":{\"jid\":\"xmvmostlmtlx\",\"jobtype\":\"Default\",\"args\":[\"d\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null},\"complete\":{\"jid\":\"dtxsivqukpad\",\"jobtype\":\"Default\",\"args\":[\"c\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null}}

The docs say we can use complete "and/or" success. That implies the behavior I'm testing above to me: complete should always run, and success should run (also) if all Jobs were successful. In actuality, it seems that you can use complete or success, and if you pass complete and success, only complete will run.

The functionality isn't particularly important to me, but I would like to know if it's intended, mainly so I can keep hunting for a bug in my client if you're confident Faktory should run success and complete in this case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAWX74MRUWFY5GGMTXHO3TBP6VPANCNFSM4YNK47YA .

pbrisbin commented 3 years ago

Damn, I thought we were on latest. Things move fast.

Sorry for not doing that due diligence before, but no, there's nothing directly related in the CHANGELOG that I can see.

mperham commented 3 years ago

Ok I’ll take a look when I get home.

On Mon, Mar 1, 2021 at 13:28 patrick brisbin notifications@github.com wrote:

Damn, I thought we were on latest. Things move fast.

Sorry for not doing that due diligence before, but no, there's nothing directly related in the CHANGELOG that I can see.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/340#issuecomment-788307071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAWX5WIYNN6NKCAWIND5LTBQBITANCNFSM4YNK47YA .

pbrisbin commented 3 years ago

Thank you, no rush! As I said, this is more about clarifying what behavior I should expect to see as I test my client. I can live without being able to support a complete and success callback.

mperham commented 3 years ago

The expected behavior is that both callbacks can be defined and both will fire at most once.

On Mon, Mar 1, 2021 at 13:47 patrick brisbin notifications@github.com wrote:

Thank you, no rush! As I said, this is more about clarifying what behavior I should expect to see as I test my client. I can live without being able to support a complete and success callback.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/340#issuecomment-788320534, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAWXZBQEYMTUMXDCLAJZ3TBQDNZANCNFSM4YNK47YA .

pbrisbin commented 3 years ago

Cool, that matches with how I read the docs. I believe there is a bug then, as I don't see that happening.

mperham commented 3 years ago

I found a typo in faktory_worker_ruby that broke the complete callback. Once fixed it is working as expected. Note the two defined callbacks in BATCH NEW.

D 2021-03-03T17:59:45.256Z BATCH NEW {"success":{"jid":"ef3243d5c9406435470df153","args":[],"queue":"default","jobtype":"BatchCallbackJob"},"complete":{"jid":"ba98231a06b16678b2e6e036","args":[],"queue":"default","jobtype":"BatchCallbackJob"}}
D 2021-03-03T17:59:45.287Z BATCH COMMIT b-r6Tit4Vg6HXmLA
D 2021-03-03T17:59:46.528Z Enqueueing b-r6Tit4Vg6HXmLA/complete ba98231a06b16678b2e6e036
D 2021-03-03T17:59:46.554Z Batch: ack complete callback for b-r6Tit4Vg6HXmLA
D 2021-03-03T17:59:46.555Z Enqueueing b-r6Tit4Vg6HXmLA/success ef3243d5c9406435470df153
D 2021-03-03T17:59:46.571Z Batch: ack success callback for b-r6Tit4Vg6HXmLA
D 2021-03-03T17:59:46.574Z Batch b-r6Tit4Vg6HXmLA success!

I would definitely recommend upgrading to 1.4.2 or 1.5.0. I don't see any batch fixes in git log but it's possible something snuck in.

pbrisbin commented 3 years ago

OK, thank you for checking.

Note the two defined callbacks in BATCH NEW.

Yup, looks just like mine, right?

BATCH NEW {\"description\":\"foo\",\"success\":{\"jid\":\"xmvmostlmtlx\",\"jobtype\":\"Default\",\"args\":[\"d\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null},\"complete\":{\"jid\":\"dtxsivqukpad\",\"jobtype\":\"Default\",\"args\":[\"c\"],\"retry\":null,\"queue\":null,\"at\":null,\"custom\":null}}

The presence of fields with null vs not having fields can't matter right?

I'll re-visit when we can update and see if it fixes things. Feel free to close this Issue if you don't want to hold it open until I can confirm things.

pbrisbin commented 3 years ago

Ah, I found it!

It was a problem with my test cases, that I only noticed once I started watching the Faktory logs while running it.

I was intending to add a small delay to give Faktory time to enqueue the callbacks before I tore down my worker for the assertion phase. But I accidentally had the delay such that it was before BATCH COMMIT. It seems the two-callback test case was just enough to consistently lose the race.

Sorry for the noise.