HXLStandard / libhxl-python

Python support library for the Humanitarian Exchange Language (HXL) data standard.
The Unlicense
40 stars 11 forks source link

Tagger failing #276

Closed davidmegginson closed 3 years ago

davidmegginson commented 3 years ago

The tagger is failing on the following dataset:

https://data.humdata.org/dataset/f18e9a28-68e5-45f3-8e88-478747555a0c/resource/28beeb70-56b4-4fa5-adaf-54846bbea8f9/download/3wdata_countrywide_allsectors_allorgs_24sep2020.xlsx

Can reproduce via the commandline with

$ hxltag -m "organization#org+name" "$URL"

Reported by @mcarans in https://humanitarian.atlassian.net/browse/HDX-7423

davidmegginson commented 3 years ago

Also fails with a local copy of the file, so the problem is not network timeouts. Never gets to the point of presenting the tagger with options.

davidmegginson commented 3 years ago

An empty row before the text headers was stopping the tagger's scan. Fixed.

Confirmation link on beta Proxy: https://beta.proxy.hxlstandard.org/data.csv?tagger-match-all=on&tagger-01-header=organization&tagger-01-tag=%23org%2Bname&tagger-02-header=organization+acronym&tagger-02-tag=%23org%2Bcode&tagger-03-header=organization+type&tagger-03-tag=%23org%2Btype&tagger-04-header=implementing+partners&tagger-04-tag=%23org%2Bimpl%2Bname&tagger-05-header=sector&tagger-05-tag=%23sector%2Bname&tagger-07-header=state%2F+region&tagger-07-tag=%23adm1%2Bname&tagger-08-header=township&tagger-08-tag=%23adm2%2Bname&tagger-12-header=project+title&tagger-12-tag=%23activity%2Bproject%2Btitle&tagger-19-header=project+status&tagger-19-tag=%23status&url=https%3A%2F%2Fdata.humdata.org%2Fdataset%2Ff18e9a28-68e5-45f3-8e88-478747555a0c%2Fresource%2F28beeb70-56b4-4fa5-adaf-54846bbea8f9%2Fdownload%2F3wdata_countrywide_allsectors_allorgs_24sep2020.xlsx&header-row=3&dest=data_view