Factual / drake

Data workflow tool, like a "Make for data"
Other
1.48k stars 110 forks source link

S3 functionality doesn't work properly with directories #163

Open justmytwospence opened 9 years ago

justmytwospence commented 9 years ago

Say I'm pulling all the files in an S3 directory, s3://foo/bar/heatmap/, to a local directory ./data/heatmap/. If this step has already been performed (ie, ./data/heatmap/ already exists), then this drake step:

heatmap <- s3://foo/bar/heatmap/
    mkdir -p data/heatmap
    s3cmd get --recursive $INPUT data/heatmap

results in:

The following steps will be run, in order:
  1: data/heatmap <- s3://foo/bar/heatmap/ [timestamped]
Confirm? [y/n]

I would expect it to tell me the step doesn't need to be run, which is the behavior when I specify a file as the input, rather than a directory, eg:

heatmap <- s3://foo/bar/heatmap/heatmap_2015-02-09_2015-02-12.csv
    mkdir -p data/heatmap
    s3cmd get --recursive $INPUT data/heatmap

Is this a fixable bug? I know there's some weirdness around the idea of directories in S3.