NRGI / resource-projects-etl

ETL processes for rp.org
GNU General Public License v2.0
3 stars 2 forks source link

One companypayment ID has many companies/dates associated with it #25

Closed Bjwebb closed 8 years ago

Bjwebb commented 9 years ago

http://lodspeakr.nrgi-dev.default.opendataservices.uk0.bigv.io/companypayment/tl/f1344401b51b31d1.html

This seems to only happen when there is no value.

My assumption is that each company/date should have it's own companypayment ID. Is that correct? Should these triples for the cases without values even exist?

timgdavies commented 9 years ago

Ok. I can see why this is happening. Just looking into the best fix to TagLifter.

timgdavies commented 9 years ago

This is a combination of the way Tag Lifter caches identifiers, and the way the file was formatted.

Removing the column labelled '#companyPayment' (which duplicates the #companyPayment+value) field, and renaming #governmentReceipt to #governmentReceipt+value should deal with this.

I'm having some unicode issues doing that locally... but will review more later.