Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes
Eclipse Public License 1.0
13 stars 4 forks source link

Introduce whitespace trimming under a feature flag #122

Open Robsteranium opened 4 years ago

Robsteranium commented 4 years ago

We have discussed the possibility of stripping leading/ trailing whitespace from pipeline inputs (raised in #111, #113 and discussed in #115). This would make table2qb more forgiving of data submitted by users who might have prepared csv in excel and so will not have noticed the whitespace (or even expected that it could cause values not to be matched).

We resolved instead to improve validation rather than relaxing input requirements (implemented in #102). Along the way we decided that it might be possible to get the best of both worlds by introducing this functionality (whitespace trimming) behind a feature-flag. The pipeline could then be run in relaxed or strict mode as required. Users with machine-written inputs might enjoy the safety of strict mode, human-writers might enjoy the easiness of relaxed mode. If problems occur under relaxed mode, switching to strict mode might offer help debugging.

Note that we would want to just do this on cell values as well as headers for two reasons: 1) We match headers from the cube with values from the columns configuration. 2) We'd also likely want to trim labels to avoid having e.g. multiple labels per code URI (because the slugise transformation strips trailing non-alphanumerics) that will be impossible to distinguish in the UI where trailing whitespace isn't always apparent.