damballa / parkour

Hadoop MapReduce in idiomatic Clojure.
Apache License 2.0
257 stars 19 forks source link

Ensure dseq supports being passed a list of inputs #3

Closed josephwilk closed 10 years ago

josephwilk commented 10 years ago

I was struggling to get dseq to work with a list of input sources from a glob:

(seqf/seq (fs/path-glob "/*.blah"))

But this does not work since you have to call seq with apply:

(apply seqf/dseq (fs/path-glob "/*.blah")))

Since the & paths is a list of lists in the first example: https://github.com/damballa/parkour/blob/master/src/clojure/parkour/io/seqf.clj#L12

This felt a bit odd so I thought I would suggest a patch to allow both forms to work. I've tried to maintain the existing api and avoid any breaking changes.

Warning: written without :coffee:

llasram commented 10 years ago

In general I'm not a fan of APIs which accept either an individual value or a sequence of values for each argument. I also somewhat feel that Hadoop already supports this case by allowing you to pass in globs instead of files for input paths. Is there a particular reason you'd prefer to expand the glob in-line?

josephwilk commented 10 years ago

In this case there was some filtering of the glob results.

If this does not fit perhaps an example of multiple via apply is worth adding to the docs to avoid this confusion for anyone else.

llasram commented 10 years ago

"Vision" was a bit too grandiose :-). It's just that in my experience, Clojure APIs which automatically flatten or add levels of collection tend to make code harder to reason about, because it's less clear what the arguments to such functions necessarily are (or are supposed to be).

An example of providing a collection of paths instead would be awesome.

josephwilk commented 10 years ago

@llasram I've added a little example in the docs dealing with multiple input files, hopefully that will help future users avoid my confusion.