TheClimateCorporation / lemur

Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps".
Apache License 2.0
86 stars 20 forks source link

Convert Lemur from a "CLI-only" application to a "library with CLI wrapper" #23

Open mlimotte opened 10 years ago

mlimotte commented 10 years ago

Requested by Kyle Burton, Ryan Michael and Andrew Montalenti.

There have been some requests to allow lemur to be used as a library instead of a CLI just tool.

Roughly, this is the work effort:

  1. CLI wrapper.
    Solution: Move the -main fn from lemur.core into a new namespace (e.g. lemur.tool) and change the shell script bin/lemur to use the new namespace.
  2. The lemur.command-line/quit function does an actual System/exit.
    Solution: Make it throw an Exception instead; and have the lemur.tool wrapper catch the Exception and do System/exit with the error message.
  3. And the key problem: lemur.core manages a bunch of state in a global atom (not a great practice, but I didn't know any better Clojure patterns at the time). It's not really a problem for command line, but if you start to use it as a lib, there will be conflicts when multiple jobs are submitted. Solution: Create a context object and change functions that use the global context to accept an extra arg instead. This should only impact the lemur.core namespace.
jimdowning commented 10 years ago

Is there still interest in this? I don't have much experience in lemur, but I'm willing to give it a crack. Have any of these already been done?

amontalenti commented 10 years ago

I was interested in this before, but I don't think I'm that interested in it any longer. I have since found alternatives for doing batch runs in EMR. But I still think it's an interesting idea!

mlimotte commented 10 years ago

No one is working on it, to my knowledge. If you want to give it a try, I'm happy to provide feedback.

On Thu, Sep 18, 2014 at 12:24 PM, Jim Downing notifications@github.com wrote:

Is there still interest in this? I don't have much experience in lemur, but I'm willing to give it a crack. Have any of these already been done?

— Reply to this email directly or view it on GitHub https://github.com/TheClimateCorporation/lemur/issues/23#issuecomment-56064533 .

jimdowning commented 10 years ago

For stage 1, is a lemur.tool ns still a good idea, or should I look to move -main into lemur.command-line?

mlimotte commented 10 years ago

Combining lemur.tool and lemur.command-line into a single namespace as lemur.command-line seems reasonable. lemur.command-line/quit should probably be re-named, as it won't actually quit anymore.

jimdowning commented 10 years ago

I've created a new lemur.tool, but trying to merge it with lemur.command-line has revealed some coupling between core and command-line, mostly caused by the context map carrying raw args as well as everything else.

jimdowning commented 10 years ago

Is there an IRC channel for this project?

mlimotte commented 10 years ago

No IRC channel. But you can reach out to me over google chat (marc at climate dot com). We can discuss the core/command-line problem.

On Thu, Sep 25, 2014 at 5:41 AM, Jim Downing notifications@github.com wrote:

Is there an IRC channel for this project?

— Reply to this email directly or view it on GitHub https://github.com/TheClimateCorporation/lemur/issues/23#issuecomment-56796075 .

jeroenvandijk commented 10 years ago

I'm interested in this too. Currently I have a separate Lemur project that has launch configurations for other projects. Instead of this, I would like to have the lemur dependency as part of my project.clj and then use it via a Leiningen plugin. This requires to have Lemur as library, but it doesn't require the cli part. However I'm assuming the cli part shouldn't be hard when the internal API is clean enough for a Leiningen plugin.

@jimdowning If I can be of assistance here please let me know.

mlimotte commented 10 years ago

Hi Jeroen, Jim,

Let me know if I can answer any questions. I don't have a particular need for this, but I think it makes sense to do. And I apologize for some of the anti-patterns in the original code base, I now know better.

marc

On Mon, Sep 29, 2014 at 5:33 AM, Jeroen van Dijk notifications@github.com wrote:

I'm interested in this too. Currently I have a separate Lemur project that has launch configurations for other projects. Instead of this, I would like to have the lemur dependency as part of my project.clj and then use it via a Leiningen plugin. This requires to have Lemur as library, but it doesn't require the cli part. However I'm assuming the cli part shouldn't be hard when the internal API is clean enough for a Leiningen plugin.

@jimdowning https://github.com/jimdowning If I can be of assistance here please let me know.

— Reply to this email directly or view it on GitHub https://github.com/TheClimateCorporation/lemur/issues/23#issuecomment-57136987 .

jeroenvandijk commented 10 years ago

Hi Marc,

I've done some work here https://github.com/TheClimateCorporation/lemur/pull/31 I need to test it with some projects before I'm certain I didn't break anything. But feel free to comment on the approach.

Jeroen

mlimotte commented 10 years ago

Thanks, Jeroen. I'll try and take a look over the weekend.

marc

On Thu, Oct 23, 2014 at 10:58 AM, Jeroen van Dijk notifications@github.com wrote:

Hi Marc,

I've done some work here #31 https://github.com/TheClimateCorporation/lemur/pull/31 I need to test it with some projects before I'm certain I didn't break anything. But feel free to comment on the approach.

Jeroen

— Reply to this email directly or view it on GitHub https://github.com/TheClimateCorporation/lemur/issues/23#issuecomment-60251738 .

mlimotte commented 10 years ago

Jeroen, Seems reasonable. I want to look at it a little more closely, and try it out, but the approach looks ok for a first iteration. Hope to eventually replace context with an explicit fn arg. I made some minor comments on the pull request.

marc

On Fri, Oct 24, 2014 at 2:33 PM, Marc Limotte mslimotte@gmail.com wrote:

Thanks, Jeroen. I'll try and take a look over the weekend.

marc

On Thu, Oct 23, 2014 at 10:58 AM, Jeroen van Dijk < notifications@github.com> wrote:

Hi Marc,

I've done some work here #31 https://github.com/TheClimateCorporation/lemur/pull/31 I need to test it with some projects before I'm certain I didn't break anything. But feel free to comment on the approach.

Jeroen

— Reply to this email directly or view it on GitHub https://github.com/TheClimateCorporation/lemur/issues/23#issuecomment-60251738 .

jeroenvandijk commented 10 years ago

Hi Marc,

Thanks for having a look. Your suggestions make sense. I definitely see it as a first iteration to get going. This seemed like the easiest first step. I hope get to the explicit context soon.

Jeroen

Btw, do you run the test suite somewhere publicly? It would be nice if the AWS tests would pass somewhere visible :-)

jeroenvandijk commented 10 years ago

I've updated https://github.com/TheClimateCorporation/lemur/pull/31. If people want to follow what's going on, I'll post everything there.