framed-data / overseer

Overseer is a library for building and running data pipelines in Clojure.
Eclipse Public License 1.0
97 stars 10 forks source link

Don't pass output of `pre-process` into `process` #64

Closed andrewberls closed 8 years ago

andrewberls commented 8 years ago

The thinking here was the pre-process could be a place to modify context and do dependency injection of sorts if desired; however it leads to unnecessary API confusion ("even if you're just using it to set up prerequisite state, make sure your pre-process function returns a job, otherwise things will break mysteriously"), and harnesses are the preferred mechanism to transform any job stage, not just pre-processors.

This is a breaking change for code that uses pre-processors to transform jobs; however changing those harnesses to modify the process stage instead should be trivial.

elliot42 commented 8 years ago

:ok: though is "pre-process" still the right name? I feel like that name really implies you're going to do something to the input before feeding it to the process, just like the output from the processor is fed to the post-processor.

elliot42 commented 8 years ago

Does PR pair with any required changes in intake-mixpanel?

andrewberls commented 8 years ago

Good point. It could be argued both ways; it comes before process. pre-process. And yes I have a commit for intake-mixpanel I'll push before this as that one should work either way