Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes
Eclipse Public License 1.0
13 stars 4 forks source link

DOS line endings in codelist CSV misses labels #68

Closed ajtucker closed 5 years ago

ajtucker commented 6 years ago

We came across an odd issue running the table2qb codelist-pipeline where the results are different when the input CSV file has DOS style line-endings.

These two files differ by line endings:

DOS: https://raw.githubusercontent.com/ONS-OpenData/ref_migration/41e267a3ee2fb5660429facfdf5c24385f82951e/codelists/residence.csv

Unix: https://raw.githubusercontent.com/ONS-OpenData/ref_migration/e107e466f09bd135bedb0e0f13ab0e3596b60f1e/codelists/residence.csv

Running them through table2qb, the former misses the rdfs:label and skos:prefLabel statements that are included in the latter.

Robsteranium commented 6 years ago

This is probably because the BOM gets interpreted as part of the column header (thus it no longer get's matched as "Label"). We can change the reader to strip this.

Since clojure/data.csv doesn't support doing anything with the mark, we'd effectively be silently ignoring it although I suspect this is probably fine for now.

Robsteranium commented 5 years ago

Should be resolved by 01f24612. Please re-open if this is still a problem.