HXLStandard / libhxl-python

Python support library for the Humanitarian Exchange Language (HXL) data standard.
The Unlicense
41 stars 11 forks source link

Attributes starting with numbers are not valid #355

Open danmihaila opened 1 year ago

danmihaila commented 1 year ago

We had a file with attributes like "#affected+injured+dineo+2017" - starting with a digit which are not recognized as valid tag. Due to this, the file is marked as not having hxl tags even if it had other valid tags&attributes. Q: do we need to change the pattern to support attributes that starts with digit? note: this tag "#affected+injured+dineo+y2017" is valid

davidmegginson commented 1 year ago

We can talk about changing the syntax. HXL was originally designed to use the same identifier syntax as Twitter hashtags or programming-language variables, which (at least at the time, in Twitter's case) don't allow a digit in first position; I've always used "+y2017" in a case like that.

In the meantime, libhxl-python shouldn't be rejecting the whole thing as valid HXL because of that one error; instead, it should just leave that column untagged. I'll look into it.

Test data: https://proxy.hxlstandard.org/hxl-test.json?url=https://data.humdata.org/dataset/67d69410-1f2f-43eb-a522-1fe9633828ea/resource/b487b498-c23f-4fdc-b92a-83c0fd2da73b/download/cyclone-2016-2022_hxl.xlsx

danmihaila commented 1 year ago

@davidmegginson the test data is not valid anymore as the data source was updated to use "+y2017"