DenisCarriere / geocoder

:earth_asia: Python Geocoder
http://geocoder.readthedocs.org
MIT License
1.63k stars 287 forks source link

Enforce usage policies #158

Open kvlahromei opened 9 years ago

kvlahromei commented 9 years ago

Hi, the lib wraps thir party geocoding services that might have dedicated policies on how (amount, timeouts, caching, threading, license, ...) the servers can be used for free. README.md already lists if apikeys are in use, but there is nothing which prevents users of this lib to violate against the policies of the services (e.g. by polling an API instance).

For example the OpenStreetMap Nominatim servers follow the Nominatim usage policy :

Otherwise an application might be blocked or even worse somebody might crash the official osm.org services by an unexpected DDOS ('well I guess the lib does all the checks').

DenisCarriere commented 9 years ago

Thanks for the issue post, I'll make sure to add more details of usage policies for the providers. There's already a Rate Limiter that works well (https://github.com/themiurgo/ratelim), it's already inside Geocoder.

I'll keep this in mind! :)

kvlahromei commented 9 years ago

Thanks @DenisCarriere it's just to make sure that nobody accidently plays unfair and might get blocked

DenisCarriere commented 8 years ago

Hey Matthias, took a little while to get to this issue, but I've added the Usage policy for OSM and I've started adding the usage policy in the README.

Commit: 321dbcbb51950ae5dff8200a60d69f1462861729

gavinhodge commented 6 years ago

I'm not sure it's possible to comply with the OSM usage policy at present because of this requirement:

I think there should be a parameter (similar to key in other providers) which allows setting the user agent. Or have I missed something?

ebreton commented 6 years ago

Hi @gavinhodge ,

Headers can be overriden in two ways:

  1. In code, when defining a provider
  2. At runtime, by arguments

You will see those two options in the constructor method of a Query (around line 390 of base.py):

        # headers can be overriden in _build_headers
        self.headers = self._build_headers(provider_key, **kwargs).copy()
        self.headers.update(kwargs.get('headers', {}))

You have an example for bing.py that will interest you:

    def _build_headers(self, provider_key, **kwargs):
        return {
            'Referer': "http://addxy.com/",
            'User-agent': 'Mozilla/5.0'
        }

Hope this help 😬

w-flo commented 6 years ago

I'm using this to set the user agent from my code without changing the library, and it seems to work:

    with requests.Session() as session:
        session.headers = {'User-Agent': 'myuseragent/123'}
        for whatever in collection:
            time.sleep(2)
            result = geocoder.osm("[...]", session=session)

Note I've also added sleep(2) to comply with that other requirement, though I guess sleep(1) is just fine.

Let me know if there's a better way @ebreton , maybe I didn't get the hints in your previous comment, especially the "at runtime" one. :-)

ebreton commented 6 years ago

Hi @w-flo ,

You actually found a third way to do it ✨

I agree that my two cases were not so explicit, let me try to do better:

  1. In code, when defining a provider

I meant that you could change the class of the provider, to make the new behavior applied for all. The function to modify would be _build_headers(self, provider_key, **kwargs)

  1. At runtime, by arguments

I meant that you could do something similar that what you have done. I was pointing out the key-argument headers, but session does the job too.

To summarize: what you have done is correct, but works only for you. If you wish to change the lib, then you might consider the first option and you are welcome to make a PR 😇

Hope that helps ! Manu