Best practices for Faktory worker library command retry behavior

As I mentioned on the Faktory Gitter, I'm in the process of implementing a JVM worker library after finding the existing ones abandoned and/or insufficient for our needs. My hope is to eventually open source it, but it's not quite ready for prime time.

I have much of the basic functionality working and am starting to look at robustifying the error handling for things like ephemeral network or Faktory server outages. I wanted to get your thoughts on best practices for how/whether worker libraries should support retrying Faktory commands, such as in the following example:

Worker pulls a job off a queue and starts working
Either the Faktory server dies unexpectedly (unlikely, I know, but possible) or the route to the server becomes unavailable
The worker finishes the job and tries to ACK it (or it fails and the worker tries to FAIL it)

With the worker unable to reach the Faktory server, obviously the ACK/FAIL command cannot be succesfully sent. Should the worker library handle automatically retrying the command up to N times until it gives up? Or should that decision be left up to the application assuming the library lets the application know the command has failed (e.g., exception, return code)? Or should we just leave it up to the Faktory server to eventually time out on the job and release for re-execution?

Assuming the job is idempotent, it seems to come down mostly to a question of how long we're willing to wait for the job to be retried. Is there anything I'm missing?

contribsys / faktory

Best practices for Faktory worker library command retry behavior #431