chop-dbhi / origins-generators

Repository of fact generators for Origins.
0 stars 1 forks source link

Change delimited generator --identifier option to be variadic #10

Open bruth opened 9 years ago

bruth commented 9 years ago

The current implementation splits the value of --identifier by comma, which can easily break down if column names are verbose. For example a REDCap data dictionary has the column name Choices, Calculations, OR Slider Labels which would be incorrectly split into three separate columns.

To fix this change [--identifier=<identifier>] to [--identifier=<identifier>...] which allows for specifying multiple --identifier flags, e.g.

delimited --identifier=foo --identifier="verbose, name"

The docopt value will now be a list:

opts['--identifier']  # ['foo', 'verbose, name']
swanijam commented 9 years ago

There is a potential for multiple sheets to have identical column names, as well as a potential for the full list of identifiers not to be available in any sheet. This means that there must be a way of specifying the identifier per sheet. It might be a little clunkier, but I think something like --identifier=Sheet1:Title --identfier=Sheet1:Artist --identifier=Sheet2:Release_Date would work. What do you think?

bruth commented 9 years ago

Yes, that is what I was thinking too.