damballa / parkour

Hadoop MapReduce in idiomatic Clojure.
Apache License 2.0
257 stars 19 forks source link

avro/dsink determine output path from input? #16

Closed stanfea closed 9 years ago

stanfea commented 9 years ago

Hi,

Any way to have something like ::mr/sink-as dux/prefix-keys but instead of adding prefix creates a subdir? Or a way to have an avro dseq as input that filters by prefix in filename?

Thanks!

Stefan

llasram commented 9 years ago

The dux/prefix-keys prefixes can have "/" in them (any number even) to create files in subdirectories of the output directory. Note that this currently breaks the return-value dseq the parkour.graph API will create on the job results; a PR fixing the issue would be welcome. I personally don't use this feature much, and the question of exactly what to return is a bit tricky -- one dseq over all the sunk content vs inferring a structured division into multiple outputs.

"Filters by prefix in filename" -- you can always use globs over the prefix as input: (mra/dseq [:default] "previous-output-dir/some-prefix-*").

stanfea commented 9 years ago

wow thanks this is really genius work you've done here!