google-code-export / google-refine

Automatically exported from code.google.com/p/google-refine
Other
1 stars 0 forks source link

Slow URL-fetch for large amounts of data #373

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a 45000-row dataset with postcodes in one column, and I'm trying to 
fetch constituency data etc using uk-postcodes.com. I've been able to do this 
with a 180-row dataset and it took around 40 mins (so I know the method works) 
but I can't get past 0% complete with the current data, and eventually Refine 
seems to give up and forget what I've asked it to do.

I'm using Refine 2.0 and am not a coder by trade - just an accountant who 
thought he'd found a great way of adding value to his data.

I have tried with both IE8 and Chrome but no performance differences seen. I've 
also tried with the source data on the C: drive and on the office network - 
again, no difference.

I thought this was the type of task Refine was supposed to be perfect for. Any 
suggestions as to how to speed up, actions to take or alternative data tools to 
try?

Thanks

Phil

Original issue reported on code.google.com by phi...@yahoo.com on 3 May 2011 at 10:51

GoogleCodeExporter commented 9 years ago
It looks like the site itself is very slow to respond, with a query such as 
http://www.uk-postcodes.com/postcode/CA143YJ.json taking about 4-5 secs to 
return data. Given that, it would take about 50 hours to retrieve 45000 rows of 
constituency data by my calculations.  4*45000/60/60  Perhaps the data can be 
scraped better by going to government sources here ? 
http://www.direct.gov.uk/en/Dl1/Directories/Localcouncils/AToZOfLocalCouncils/DG
_A-Z_LG
http://www.direct.gov.uk/en/Dl1/Directories/index.htm
Alternatively, perhaps you could ask someone on the ScraperWiki email list to 
see if someone could finish up and add more data to their scraper which seems 
to be a start of some of the data your looking for: 
http://scraperwiki.com/scrapers/council_postcodes/

Original comment by thadguidry on 3 May 2011 at 1:36

GoogleCodeExporter commented 9 years ago
Thanks for your help. It looks like the ScraperWiki is a direct scrape from the 
government link you posted, but unfortunately they only seem to have postcodes 
of the council's physical offices, and not all postcodes covered by any one 
office.

The only other source I could find which could give a local authority name for 
any given postcode is http://mapit.mysociety.org/postcode/CA143YJ.html and this 
gives the same type of data, but again I had a 1-hour wait before it gave up on 
0% yesterday.

It's good to know that the hold-up isn't with Refine, it's with the third party 
websites. My next attempt at a solution will be downloading the open source 
data from Ordnance Survey and looking the info up straight into Excel 
(http://www.ordnancesurvey.co.uk/oswebsite/products/os-opendata.html) - thanks 
again and wish me luck.

Original comment by phi...@yahoo.com on 4 May 2011 at 9:51

GoogleCodeExporter commented 9 years ago
Closing, not a Refine problem.

Original comment by tfmorris on 8 Oct 2011 at 7:33