framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Remove all notion of a "system" #41

Closed andrewberls closed 9 years ago

andrewberls commented 9 years ago

This removes all references to "systems", which is actually an overloaded term in Overseer. The term was used to refer to both:

1) The overseer.system namespace, which confusingly combined a CLI startup utility and an entry point to start a worker (Q: What's the difference between a "system" and a "worker"? A: Nothing, really)

2) A map containing both :config and a :conn (Q: What's the difference between a "system" map and a "config" map? A: Nothing, really)

The CLI startup code has been moved to overseer.runner, and the old system/start function is removed in favor of the direct worker/start! function (or the existing overseer.api/start alias, if you like.) Additionally the signature has been changed to simply accept a config and a set of job handlers - reason being that since config already defines a Datomic URI, we can establish the connection ourselves without requiring the user to do it and passing around a strange "system" map.

Note that this also removes all of the old atom-based signalling capabilities that were totally unused, and means that start is a blocking call, i.e. it does not return a future that can be dereferenced. This simplifies the code nicely, and can be safely removed as no production workflow actually relies upon this behavior.

Future commits may re-add more sophisticated signalling/control capabilities if deemed necessary.

danevans commented 9 years ago

I don't have any complaints so :+1: but I am curious if you're saying that the pattern of pulling a conn off of a config like we see in framed.config is something you want to move away from?

andrewberls commented 9 years ago

Well, the code snippet you posted is not really 'pulling a conn off a config' in my mind - it's just a helper function that reads the datomic uri from config and returns a conn, exactly the same as we're doing (inline) here. Nesting conn inside config and reading it (:conn config) is what I'm against

andrewberls commented 9 years ago

Oh I see what you're getting at though; I'll look again but I think I don't think the helper is warranted here

andrewberls commented 9 years ago

Even better - I don't think we need the config at all in some places, just the conn ....

elliot42 commented 9 years ago

I haven't had a chance to review this yet but just wanted to say:

  1. the system vs. config was intentional; the config was intended to be a pure immutable value loaded from configuration data, vs. the system is a map of live stateful components, i.e. https://github.com/stuartsierra/component
  2. the atom based control was intentional, because it allowed the worker, like the web server, to run one or more processes asynchronously along side a different process, e.g. you can start a web server but not have it monopolize your REPL--this is what allows the REPL to control the web server live. Similarly, with the atom based control, you could start a worker from your REPL, and use the REPL to start/stop/configure the worker process, which could live along side the REPL. This design would allow concurrent meta access to the worker and worker state in the future, e.g. it would allow other concurrent processes to monitor and/or give control APIs to the worker, because the worker wouldn't be solely blocking the top-level JVM process.

So, I'm not totally opposed to removing things, but I do want it to be clear that they were there for reasons, some of which I think are still desirable.

andrewberls commented 9 years ago

1) I realize this, but we aren't using component, so I found it unnecessarily confusing. We still have the immutable config separated from the dynamic/stateful conn stuff, but without the wrapper on top

2) I recognize the intent the atom control, but as mentioned this wasn't part of any actual production workflow (nor my development workflow - when I'm doing local testing I just spin up a few workers in tabs), and thus thought we may as well simplify the code at no real cost. What I'm more interested in exploring down the road is external control of the worker process, i.e. through Unix signals (sidekiq was awesome at this) as well as monitoring/query tools

elliot42 commented 9 years ago

For item 2, I don't think we're going to be able to accomplish control or monitoring well without a separate control process:

2a. I don't really understand how UNIX signals are an expressive enough interface, e.g. sidekiq only supports like 3 rudimentary operations? https://github.com/mperham/sidekiq/wiki/Signals

The way you actually get information about the status of the system is through the Ruby API: https://github.com/mperham/sidekiq/wiki/API

Compare to Redis or Beanstalk or Memcached--you need an interface to send the thing queries and get responses back about the health of the system.

2b. Even if you had an external control / monitoring / logging interface, from an internal design point of view, this means that you would need to make any commands or queries part of the job loop, i.e. the job loop instead of being

(loop []
  (do-job (get-job ...) ...)
  (recur)

Becomes:

(loop []
  (check-for-control-concern0)
  (check-for-logging-concern1)
  (check-for-monitoring-concern2)
  (do-job (get-job ...) ...)
  (recur))

Note that, in this scheme, you can't do any control or logging or monitoring concerns while you're in the middle of a job, because the job is blocking the whole worker/loop.

Whereas if you have multiple concurrent processes, the separate concerns operate concurrently, you can send/receive control messages, send logging/monitoring events, and do jobs all at the same time without blocking each other and polluting each other's code, and they can talk to each other directly in Clojure.

andrewberls commented 9 years ago

I haven't put a whole lot of thought into what I might want out of item 2 - with that said,

Sidekiq doesn't support a whole lot of operations, but the graceful restart stuff is super useful / met my needs in practice for quite some time (and of course there's always kill -9 and friends). I mean a controlled stop/restart was really just what the atom stuff was doing, albeit from within a repl (which as mentioned is a nonrequirement for me, personally). We had Capistrano set up to gracefully restart Sidekiq during deploys which was nice, although I guess we're using upstart for a similar purpose here.

Unix signal handlers are independent of the job loop. As for everything else, this could involve writing metadata/info to files on disk, existing jvm stuff (JMX ?), or something else (fork off a control listener thread and a job executor thread ?). It all depends on what you need out of it, which I haven't thought much about. Like I said if this becomes a really pressing thing we can revisit, but I'd prefer to simplify for now.