datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
598 stars 181 forks source link

Custom column names on (xml) datastore. #1341

Open markansink opened 8 years ago

markansink commented 8 years ago

For a customer of us I'm using DataCleaner to parse a huge XML file. In the conf.xml I specified all the relevant XPATH's which works quite nice and the performance is also very good.

However I can't specify the name of the actual column. DataCleaner uses the XPATH to resolve the columnname, but when using @ in the path or refering to an other 'table' with the index(path) function the column names are hard to read.

I would like to have the ability to give a custom/functional name to the column, so it's easier to use.

kaspersorensen commented 8 years ago

One follow-up question: Would you like this "column naming" saved in the individual job, or at the datastore level? (Or both? But please prioritize in that case)

markansink commented 8 years ago

I would like to save/config this on the datastore level, so every job uses the same names.Also the xpath is stored/configured on the datastore level, so it makes sence to set the name also on this level.

LosD commented 8 years ago

Isn't this mostly a matter of getting the custom-naming support into MetaModel, like we're doing for some datastore types already? If we could somehow generalize that, we could start tapping into it from DC.

Regarding the datastore/job part, as we'll soon try to make jobs more uncoupled from the datastores, there probably will be an alias in jobs (which will of course take precedence)