HXLStandard / hxl-proxy

Web-based data proxy for transforming, filtering, and validating humanitarian datasets. Supports Python 3.x **only.**
http://proxy.hxlstandard.org
The Unlicense
21 stars 4 forks source link

Tagger function fails if headers are the same #331

Closed SimonbJohnson closed 1 year ago

SimonbJohnson commented 4 years ago

This link shows how if the headers are the same then it pulls through the incorrect tags.

Thanks to @Zibethin for finding

davidmegginson commented 4 years ago

Hi, @SimonbJohnson -- thanks for the report. I think the link didn't make it through.

SimonbJohnson commented 4 years ago

Opps, here it is https://proxy.hxlstandard.org/data/edit?tagger-match-all=on&tagger-01-header=%28a-z%29&tagger-01-tag=%23country&tagger-02-header=%28a-z%29&tagger-02-tag=%23country%2Bcode%2Bv_iso3&tagger-03-header=%280-10%29&tagger-03-tag=%23indicator%2Bperson2person&tagger-04-header=%280-10%29&tagger-04-tag=%23indicator%2Bcovid19%2Bhazard%2Bexposure&tagger-05-header=%280-10%29&tagger-05-tag=%23indicator%2Bdevelopment%2Bdeprivation&tagger-06-header=%280-10%29&tagger-06-tag=%23indicator%2Binequality&tagger-07-header=%280-10%29&tagger-07-tag=%23indicator%2Baid%2Bdependency&tagger-08-header=%280-10%29&tagger-08-tag=%23indicator%2Bsocioeconomic%2Bvulnerability&tagger-09-header=%280-10%29&tagger-09-tag=%23indicator%2Buprooted%2Bpeople&tagger-10-header=%280-10%29&tagger-10-tag=%23indicator%2Bhealth%2Bconditions&tagger-11-header=%280-10%29&tagger-11-tag=%23indicator%2Bfood%2Bsecurity&tagger-12-header=%280-10%29&tagger-12-tag=%23indicator%2Bgbv&tagger-13-header=%280-10%29&tagger-13-tag=%23indicator%2Bvulnerable%2Bgroups&tagger-14-header=%280-10%29&tagger-14-tag=%23indicator%2Bvulnerability%2Bhazard%2Bindependent&tagger-15-header=%280-10%29&tagger-15-tag=%23indicator%2Bmovements&tagger-16-header=%280-10%29&tagger-16-tag=%23indicator%2Bbehaviour&tagger-17-header=%280-10%29&tagger-17-tag=%23indicator%2Bdemographic%2Bcomorbidity&tagger-18-header=%280-10%29&tagger-18-tag=%23indicator%2Bcovid19%2Bvulnerability&tagger-19-header=%280-10%29&tagger-19-tag=%23indicator%2Bvulnerability&tagger-20-header=%280-10%29&tagger-20-tag=%23indicator%2Bdrr&tagger-21-header=%280-10%29&tagger-21-tag=%23indicator%2Bgovernance&tagger-22-header=%280-10%29&tagger-22-tag=%23indicator%2Binstitutional&tagger-23-header=%280-10%29&tagger-23-tag=%23indicator%2Baccess%2Bhealthcare&tagger-24-header=%280-10%29&tagger-24-tag=%23indicator%2Binfrastructure&tagger-25-header=%280-10%29&tagger-25-tag=%23indicator%2Black%2Bcoping%2Bcapacity%2Bhazard%2Bindependent&tagger-26-header=%280-10%29&tagger-26-tag=%23indicator&tagger-27-header=%280-10%29&tagger-27-tag=%23indicator&tagger-28-header=%280-10%29&tagger-28-tag=%23indicator&tagger-29-header=%280-10%29&tagger-29-tag=%23indicator&tagger-30-header=%280-10%29&tagger-30-tag=%23indicator&tagger-31-header=%280-10%29&tagger-31-tag=%23indicator&tagger-32-header=%280-10%29&tagger-32-tag=%23indicator&tagger-33-header=%28very+low-very+high%29&tagger-33-tag=%23indicator&tagger-34-header=%281-191%29&tagger-34-tag=%23indicator&tagger-35-header=%280-10%29&tagger-35-tag=%23indicator&tagger-36-header=%280-50%29&tagger-36-tag=%23indicator&tagger-37-header=%280-100%25%29&tagger-37-tag=%23indicator&url=https%3A%2F%2Fdata.humdata.org%2Fdataset%2Finform-covid-19-risk-index-version-0-1-2&sheet=2&header-row=3&dest=data_view

davidmegginson commented 4 years ago

The challenge is that the tagger operates (for now) purely by string matching, though that's not obvious from the HXL Proxy UI (which prepopulates the form positionally based on the input data). There is no way currently to autotag two columns with different hashtags when they have the same header.

With the new UI cleanup from #329, it should be at least somewhat less confusing, because the tagger won't give the option to tag the same header twice.

Any substantive fix will require refactoring in libhxl via HXLStandard/libhxl-python#241

See also #315

davidmegginson commented 4 years ago

I won't be able to include positional support for this release, but there's a simple workaround: use the actual header row for the tagger mapping, then remove the comment row beneath it ("(0-10)", etc) using a row filter.

https://beta.proxy.hxlstandard.org/data/edit?dest=data_edit&filter01=select&filter-label01=Remove+comment+row+under+headers&select-query01-01=country%3D%28a-z%29&select-reverse01=on&tagger-match-all=on&tagger-01-header=country&tagger-01-tag=%23country&tagger-02-header=iso3&tagger-02-tag=%23country%2Bcode%2Bv_iso3&tagger-03-header=p2p&tagger-03-tag=%23indicator%2Bperson2person&tagger-04-header=covid-19+hazard+%26+exposure&tagger-04-tag=%23indicator%2Bcovid19%2Bhazard%2Bexposure&tagger-05-header=development+%26+deprivation&tagger-05-tag=%23indicator%2Bdevelopment%2Bdeprivation&tagger-06-header=inequality&tagger-06-tag=%23indicator%2Binequality&tagger-07-header=aid+dependency&tagger-07-tag=%23indicator%2Baid%2Bdependency&tagger-08-header=socio-economic+vulnerability&tagger-08-tag=%23indicator%2Bsocioeconomic%2Bvulnerability&tagger-09-header=uprooted+people&tagger-09-tag=%23indicator%2Buprooted%2Bpeople&tagger-10-header=health+conditions&tagger-10-tag=%23indicator%2Bhealth%2Bconditions&tagger-13-header=food+security&tagger-13-tag=%23indicator%2Bfood%2Bsecurity&tagger-14-header=gbv&tagger-14-tag=%23indicator%2Bgbv&tagger-15-header=vulnerable+groups&tagger-15-tag=%23indicator%2Bvulnerable%2Bgroups&tagger-16-header=vulnerability+%28hazard-independent%29&tagger-16-tag=%23indicator%2Bvulnerability%2Bhazard%2Bindependent&tagger-17-header=movements&tagger-17-tag=%23indicator%2Bmovements&tagger-18-header=behaviour&tagger-18-tag=%23indicator%2Bbehaviour&tagger-19-header=demographic+and+co-morbidity&tagger-19-tag=%23indicator%2Bdemographic%2Bcomorbidity&tagger-20-header=covid-19+vulnerability&tagger-20-tag=%23indicator%2Bcovid19%2Bvulnerability&tagger-21-header=vulnerability&tagger-21-tag=%23indicator%2Bvulnerability&tagger-22-header=drr&tagger-22-tag=%23indicator%2Bdrr&tagger-23-header=governance&tagger-23-tag=%23indicator%2Bgovernance&tagger-24-header=institutional&tagger-24-tag=%23indicator%2Binstitutional&tagger-27-header=access+to+health+care&tagger-27-tag=%23indicator%2Baccess%2Bhealthcare&tagger-28-header=infrastructure&tagger-28-tag=%23indicator%2Binfrastructure&tagger-31-header=lack+of+coping+capacity&tagger-31-tag=%23indicator%2Black%2Bcoping%2Bcapacity%2Bhazard%2Bindependent&header-row=2&url=https%3A%2F%2Fdata.humdata.org%2Fdataset%2Finform-covid-19-risk-index-version-0-1-2&sheet=2

davidmegginson commented 1 year ago

Moved to Jira