OpenDataServices / flatten-tool

Tools for generating CSV and other flat versions of the structured data
http://flatten-tool.readthedocs.io/en/latest/
MIT License
101 stars 18 forks source link

Non breaking space (NBSP) in spreadsheet headers #392

Open odscjames opened 2 years ago

odscjames commented 2 years ago

We have seen cases where Excel spreadsheets have header columns like:

Recipient Org:Location:Name

Where the space there is actually a non breaking space (NBSP).

When using flattentool by passing an Excel spreadsheet and a schema this causes issues. In the schema the name "Recipient Org" has a normal space, and so flattentool doesn't recognise that header as being one of the core fields in the standard. Instead it thinks it's an additional field.

Found when doing https://github.com/ThreeSixtyGiving/dataquality/issues/14

odscjames commented 2 years ago

Can we just search and replace "NBSP" to " " [ a real space ] before looking up column titles in the schema?

Maybe in both:

Is there any case where someone might actually deliberately be using the difference between NBSP's and real spaces in schema field titles as an important thing?

(If there is, do we really want to be encouraging that?!)

robredpath commented 2 years ago

Do we have any theories as to how an NBSP got in there in the first place?

I've checked the 360 standard and the spaces in the table there are real spaces, so it's not a copy-paste error there. I tried emailing it, and (at least our email) doesn't make it an NBSP even though it's an HTML email.

Was there a Recipient Org:Location:Identifier field? I'd expect that to come first, so the fact that it errored on Name is interesting.