Factual / drake

Data workflow tool, like a "Make for data"
Other
1.48k stars 112 forks source link

Support wildcard inputs & outputs (globbing) #49

Open raronson opened 11 years ago

raronson commented 11 years ago

An example is if you had a directory structure like logs/year/month/part-files and wanted to only process jan - mar from every year. the pattern would be logs/*/0[1-3]

aboytsov commented 11 years ago

Russel, would that be a duplicate of https://github.com/Factual/drake/issues/41 or https://github.com/Factual/drake/issues/50?

aboytsov commented 11 years ago

Ping.

raronson commented 11 years ago

Sorry for the delayed response.

Im not sure i understand the difference between #41 and #50. Based of the proposal you wrote in #41, aren't wildcard inputs supported?

I would like to specify a glob pattern which is similar to (as far as i can tell a subset of) regex. I dont want the pattern to be expanded in drake, other then for a file recency check, as this could expand to thousands of files. It seems that #41 would do this expansion, so i think this is different.

aboytsov commented 11 years ago

I think it's a duplicate of #50 then. Since this issue has more info, I'll close #50.

amalloy commented 10 years ago

I kinda think all of these pattern/glob feature requests are duplicates of #41, which is immensely difficult. I'd love to close this one, but will leave it up to @aboytsov.

rbb commented 8 years ago

Since #41 is potentially difficult to do, then perhaps being able to use a list variable would be helpful?

listvar=['foo', 'bar', 'other']
output <- $listvar
    tail -n +2 ${listvar}.ext1 > ${listvar}.ext2     ; Copy everything but the first line

Where, drake would be smart enough to know that listvar is a list and run the tail -n +2 command on each item in the list?

Persuant to the desires for globbing and/or regex for inputs/output, maybe list variables could be populated in with a drake step?

listvar <-
    listvar = `ls *.tgz`

I'm not convinced I have proposed a good syntax in the above examples, but hopefully it is enough to give others ideas on ways to make globbing or regex easier to implement?