DigitalCommons / open-data-and-maps

Deprecated: Implementation of Linked Open Data by the Solidarity Economy Association
6 stars 1 forks source link

Auto find geo positions outside UK #113

Closed ColmMassey closed 4 years ago

ColmMassey commented 5 years ago

If we don't have UK post codes or lat&long, what options are there to auto find lats and longs??

ColmMassey commented 5 years ago

Short term options are to use a batch conversion tool like https://csv2geo.com to generate lats and longs. They charge about $20 for 5000 records. Cost would be in preparing the data, but probably very similar preparation to what's needed anyway.

Long term, is there an LOD method that will find lats and longs if they are not provided and don't have UK postcodes.

ColmMassey commented 5 years ago

https://stackoverflow.com/questions/16737653/get-latitude-and-longitude-of-a-place-dbpedia https://wiki1.hbz-nrw.de/display/SEM/SPARQL+Examples

Some links discussing possible LOD options.

matt-wallis commented 5 years ago

This is an interesting resource: https://index.okfn.org/dataset/postcodes/ You can see which countries provide postcode data mapped to geolocation, and what the license is, and various other useful things. The bad news is that, according to this resource, this information is not readily available for most of the world.

Then there's GeoNames: https://www.geonames.org/

The GeoNames geographical database covers all countries and contains over eleven million placenames that are available for download free of charge.

which is worth further investigation ... e.g. https://www.geonames.org/postal-codes/

matt-wallis commented 5 years ago

Changed label, as this will be done when creating the data, not by the mapping software.

ColmMassey commented 5 years ago

Geo-search using GeoNames

GeoNames provide a web service, and there is a Leaflet client library. This is potentially brilliant to ease the implementation of a geo-search feature, and to do so in a way that works anywhere in the world.

This should be the first thing investigated if we implement geo-search.

Note that GeoNames also provide global postcode data 👍

sunnydean commented 4 years ago

From my research and reading previous comments we should stick either to https://wiki.openstreetmap.org/wiki/Nominatim

or some service that services nominatim https://opencagedata.com/pricing https://business.mapquest.com/pricing-plans/

for more accurate results

Thoughts?

ColmMassey commented 4 years ago

Without further exploration I'd say let's just see how hard using Nominatim is.

sunnydean commented 4 years ago

Hi, already using it.

Some of the addresses cannot be located though. E.g. SostreCivic in our entries has an address of: c/casp 43,barcelona,ES

The actual address is:

Barcelona (city) 43 (housenumber) 08010 (postcode) Carrer de Casp (street)

It's a question of should we switch to another service that can locate these or stick to Nominatim for now?

ColmMassey commented 4 years ago

What's the failure rate? Can we tell how it compares to the the one we used in batch mode?

sunnydean commented 4 years ago

The one we used in batch mode matches all of the requests (I am not sure with what accuracy though) which makes it easy to locate duplicate entries e.g. SostreCivic,c/casp 43,barcelona,ES and SostreCivic,Casp 43,barcelona,8010,ES

will both point to the same address, hence they will be found as duplicates

The service we are using currently (or any other service that matches all of them) would be a better option.

Nominatim usually fails on badly written names e.g. c/casp 43

ColmMassey commented 4 years ago

The service we are using currently (or any other service that matches all of them) would be a better option. Ok, let's go with that then. I imagine their API will be easy enough to use. Point to where we may need to set up a payment account. I'll share the API token via WhatsApp.

sunnydean commented 4 years ago

So.. It is ease to use, but their costs are insane for our requirements. The minimum is 99$/pm.. I asked their staff if they have an available api for pay as you go, they are replying tomorrow

Our requirements are a service that has an (1)API, has some (2)free requests we can make, (3)allows caching, does (4)fuzzy searching, and (5)has a reasonable pay as you go option

we need 5 and 2 because what we are doing now and in the future is caching a big chunk of requests for a whole dataset (e.g Dotcoop2019version1 is 10,000+ requests) from time to time and then using that cache when generating new-er versions (e.g. Dotcoop2019version two has the same data + 200 entries that have been changed/added) of the same/similar dataset. If we do this we presumably (having 2 and 5) we will need to pay only for the big chunk of requests (the 10,000 + ones) and then the smaller ones would be free. This will substantially minimise costs and make this a scalable solution for the future. We need caching (3) to achieve the above. We need (4) because some of the entries have dirty data (e.g. c/casp 43,barcelona,ES as an address for Barcelona, Carrer de Casp 43, 08010), instead of us cleaning it we can leave it to the service. This will ensure that our data is as uniform as it can be.

This leaves us with the following options:

  1. https://csv2geo.com/price Easy to use, already know the format of the returned data, pay-as-you-go does not have an api, I've asked them if they could open it, their team said they'll get back to me tomorrow It doesn't have any free requests at all though

  2. https://geocode.xyz/pricing Easy to use, json responses, has pay-as-you-go, allows caching, no free requests This will end up costing 25-30$ each time we have a new dataset with new addresses and then for smaller changes it will be around 5$ - 7.5$ (2.5$ per 1000 requests).

  3. https://opencagedata.com/pricing Easy to use, json responses, monthly payments, allows caching, free requests up to 2500 per day This will end up costing around 38$ each month we need to encode a large amount of data (up to 10,000 requests per day, which we might only end up doing at the start) Then we will be using the free requests per day for any updates.

The 3rd one seems to be most scalable for now, and the best solution in terms of accuracy.

some of the services sadly forbid caching and others do not do the fuzzy searching we need (I have not included them in the list above) For now I've coded it in with Nominatim so that we can easily replace it when we decide which one of these to use.

ColmMassey commented 4 years ago

Shall we set up an account with opencagedata for the moment? We can always change in the future as we get more familiar with our requirements etc. It looks like £38 (plus VAT, no doubt).

sunnydean commented 4 years ago

Yes, I am available after 30mins if you need help. Let's communicate it over whatsapp

ColmMassey commented 4 years ago

We now have this service, so closing.