Closed sroee closed 9 years ago
Definitely a bug. This might be somewhat tricky to fix while still preserving lazy file-creation. Any ideas or proposals @sroee before I start exploring potential solutions?
Well I thought that the output might better stay lazy, but is it possible that the input will
for example of 2:
(let [[err-node data-node] (-> node
(pg/output :err (seqf/dsink [BytesWritable BytesWritable])
:data (text/dsink)))]
[(-> (pg/input data-node)
(pg/map .. will happen...)
...)
(-> (pg/input err-node)
(pg/map .. will not be executed)
...)])
though for my needs solution 1 sounds good enough.
BTW my version of parkour is 0.6.1
It turns out this is pretty difficult to handle. We want to be able to detect and error on missing input paths, but (a) Hadoop's handling of 0-split inputs skips directory-creation, and (b) the current multi-job graph support in Parkou doesn't currently support any state between jobs. This is still solvable, but is going to take a little bit of design (re)thinking.
It turns out I had a separate bug plus some Hadoop version differences obscuring this issue. Fixed in the develop
branch; I'll push a new release with the fix in the next few days.
Parkour 0.6.2 is now released and includes the fix for this issue.
thanks!
Hi, When chaining output to input, and output has 0 records to write (in case of multiplexing), then file won't be created and the chained input will fail the whole job.
In my case:
In case of no items to write in :err, the folder won't be created and the input, right after will fail with the following error: