OpenAddressesUK / companies_house

A Turbot-ready ETL for the Companies House dataset
MIT License
0 stars 0 forks source link

Blew up horribly on bad CSV #1

Open pikesley opened 9 years ago

pikesley commented 9 years ago

To be fair, the offending line is horrendous:

"TOTS2TEENS LTD","08966156","","","8-10 CHANCELLOR ROAD","","SOUTHEND ON SEA","ESSEX","ENGLAND","SS!"AS","Private Limited Company","Active","United Kingdom","","28/03/2014","31","3","28/12/2015","","NO ACCOUNTS FILED","25/04/2015","","0","0","0","0","None Supplied","","","","0","0","http://business.data.gov.uk/id/company/08966156","","","","","","","","","","","","","","","","","","","",""

But still, we must be defensive

JeniT commented 9 years ago

Is Companies House using ! as an escape character for "s within fields?

pikesley commented 9 years ago

No, somebody has held down shift while entering their data...

Floppy commented 9 years ago

That's in a postcode as well. I suspect someone had the shift key held down, and it's SS1 2AS.

Floppy commented 9 years ago

amazing.

Floppy commented 9 years ago

so CH aren't escaping the data at all - we just have to swallow that error and move on.

peterkwells commented 9 years ago

In parallel to us moving on I'll point a contact at Companies House at this in case there's anything they can do to improve it over time, or that OA can do to help them improve it over time as that data probably came in through an online form....