CartoDB / dataservices-api

The CARTO Data Services API
https://carto.com/docs/carto-engine/dataservices-api/
BSD 3-Clause "New" or "Revised" License
22 stars 14 forks source link

Add ability to rate-limit geocodings per user #346

Closed rafatower closed 7 years ago

rafatower commented 7 years ago

At present all users within an organization share the same pool of geocoding credits.

One particular user can simply exhaust all credits available in one session preventing other users of the same organization from performing any further geocodings until the billing period is finished.

We want to be able to limit geocoding calls per user and period of time.

In order to do so, a couple of configuration settings shall be added:

Those new settings shall be configurable at several levels:

User config shall take precedence over organization, and org config shall take precedence over global config. If it is absent, no limit should be applied (as it works at present).

E.g: the rate limit can be absent in global configuration but present at an organization level, with these values:

{
  'geocoder_user_rate_limit_limit': 1000,
  'geocoder_user_rate_limit_period': 86400
}

That would mean that, regardless of the monthly quota allowance of the organization, no individual user within the organization could exceed 1000 geocodings/day.

Edit 1: instead of limiting "per day" make it generic to any configurable period. Added example as well. Edit 2: this does not require UI. It is enough with a rake task + proper docs.

rafatower commented 7 years ago

@javisantana a couple of questions:

cc/ @rochoa and @jorgesancha for awareness.

javisantana commented 7 years ago

I'd change

"We want to be able to limit geocoding calls per user and day."

by

"We want to be able to limit geocoding calls per user and period of time"

So we configure the amount of time (in days or months)

jgoizueta commented 7 years ago

Just some commentary about how to implement this.

I assume that to simplify the implementation we'll not try use a "rolling time window", but just use "static non-overlapping periods", i.e. partition time, so that we accumulate geocodings in time buckets, using floor(time_in_seconds / geocoder_user_rate_limit_period) as the time key.

I'll try to sound less pedantic ๐Ÿ˜› : we won't count gecodings done during the last geocoder_user_rate_limit_period seconds, but just count geocodings in geocoder_user_rate_limit_period-second periods; when a new period starts the counter starts at 0.

So we can proceed as with the UserConcurrentSyncsAmount in amount where a counter is stored in a redis key limits:Importer:UserConcurrentSyncsAmount:uUSERNAME.

Here we could count geocodings in a key like this:

user:{USER}:geocoding-rate:{geocoder_user_rate_limit_period}:{T}

where T is the time period number (floor(time_in_seconds/geocoder_user_rate_limit_period) and we use also the period duration in the key to avoid ambiguity/problems if the configuration changes.

So, if the current Unix time is 1488820493, and the period is a day we would use this key to check/increment the geocoding count:

user:{USER}:geocoding-rate:86400:17231
rafatower commented 7 years ago

Some implementation patterns from Redis docs: https://redis.io/commands/incr#pattern-rate-limiter

jgoizueta commented 7 years ago

Hey, I've realized there's one scenario not supported by the implementation we're working on: to make the limit period equal to the billing period (which could be a simple way to make sure any single user doesn't consume more than x% of the org quota). We can make the period a month, but it won't necessarily coincide with the billing "month".

@javisantana did you have that scenario in mind when you requested arbitrary periods?

rafatower commented 7 years ago

I'd strictly keep quotas+usage+billing separated from rate limits. Why's that?

The design of quotas as they are to date can be improved. But again I prefer not to mix those two. If anyone still has doubts about separating them, I'm really willing to discuss it thoroughly.

jgoizueta commented 7 years ago

Since I'm taking my time to commit any code, I'll comment here some details of what I'm working, FYI, and to give you guys a chance to discuss my decisions.

I aim to have this configurable at server level (in cdb_conf), organisation and user (in redis). User configuration has precedence over user over server and default is not rate limits.

We won't save any data in the rails metadata db (we'll have to consider that only if/when we implement a GUI to configure rate limits).

Configuration Interface

Rake

There will be an interface consisting of rake tasks to query the configuration and change it:

rake carto:rate_limit:set_user(service, username, new_configuration)
rake carto:rate_limit:set_org(service, username, orgname, new_configuration)
rake carto:rate_limit:set_server(service, username, new_configuration)

I'm thinking about having a single rake task to obtain the configuration of a user, but we could also have different tasks for system/org levels.

rake corto:rate_limit:info(service, username)

Output:

{service} user configuration for {user}: limit: xx, period: yy
Overrides:
{service} org configuration for {organization}: limit: zz, period: yy

SQL

The rake tasks will use a SQL interface in the dataservices client:

One public (security definer) function to obtain information about the user configuration (again we could have separate functions for server/user/org levels and have a service parameter)

SELECT * FROM cdb_dataservices_client.cdb_service_rate_limit_info();
 service  | scope  | rate_limit                     | active
----------+--------+--------------------------------+----------
 geocoder | server |  {"limit":1000,"period":86400} |      f
 geocoder |    org |  {"limit":2000,"period":86400} |      f
 geocoder |   user |  {"limit":1500,"period":86400} |      t

For consistency with cdb_service_quota_info service name should be 'hires_geocoder' ๐Ÿค”

The we'd have three private client functions to change the configuration (writing to cdb_conf or redis):

cdb_dataservices_client.cdb_service_set_rate_limit(service text, new_configuration json)
cdb_dataservices_client.cdb_service_set_user_rate_limit(service text, username text, new_configuration json)
cdb_dataservices_client.cdb_service_set_org_rate_limit(service text, orgname text, new_configuration json)

Decisions, decisions

Some decisions I'm facing while implementing this:

How to store and how present the new parameters:

Current choice: JSON object

Where to store the new parameters in cdb_conf:

Current choice: new entry; it simplifies implementation and other keys are already crowded, although may seem inconsistent with other parameters

rafatower commented 7 years ago

Got it! I just want to make a couple of remarks to keep in mind:

To sum up and decide between the config interface options:

javisantana commented 7 years ago

one comment, can it be limited to just one provider? (google, here...)

jgoizueta commented 7 years ago

@javisantana: No, we were considering only rate limits per service. We were also considering as requirements having user/org level limits and server level limits only as nice to have (but the WIP implementation actually allows server level limits). The idea was to not have rate limits by default and be able to set them for specific users/orgs.

Provider-specific limits could be useful for server level, and would allow to introduce limits by default per provider (and then be able to override them for users/orgs).

This isn't too hard to introduce in the current implementation (but configurations will be a little messier).

Summary

We now have (in #355): Per service rate-limits configuration, currently only applied to geocoder. The configuration is disabled by default and can be applied, in decreasing order of precedence:

[EDITED: to clarify the precedence which was confusing]

jgoizueta commented 7 years ago

I'm working now on the configuration interface (or rather, trying to decide if implement the SQL part or skip it and have only a rake interface). But the code (#355) to use the configuration and apply rate limits is complete (save for Google geocoder) and independent of the configuration interface.

So I'd be nice to have it reviewed now, FYI @rafatower @ethervoid ๐Ÿ˜‰

jvillarf commented 7 years ago

Hi!! we don't have an official date yet regarding onpremises LDS release, but if it would be great if this could be finished by April, 7th. How does it sound? Thanks!

jgoizueta commented 7 years ago

I cannot give any estimate yet, for when we can have this in production, but that date sounds good. I'm aiming to have it before April 7th.

jvillarf commented 7 years ago

Ok @jgoizueta , perfect :)

jgoizueta commented 7 years ago

The dataservices-api changes are ready for CR (#355). Here's some documentation: https://github.com/CartoDB/dataservices-api/blob/346-user-rate-limits/doc/rate_limits.md

I'm still considering adding a rake interface to this (which is easy now using the SQL interface).

I'll prepare an acceptance procedure, taking into account it should be useful to be checked in on-premises installations (as suggested by @jorgesancha

javisantana commented 7 years ago

nice doc

javisantana commented 7 years ago

one question, the rate limiting works for google maps as well, right?

jgoizueta commented 7 years ago

@javisantanta No, I skipped the check for them, since we're not imposing any quota on them. [EDIT: I recalled it incorrectly, we do apply rate limits to them] It's just a flag parameter, so it would be trivial to change.

Now that I think of it it may be useful to be able to rate-limit them... what do you think?

As long as we keep the default to be "no limits" I see no problem in having rate limits affect Google too (limits would be established for individual orgs or users).

jgoizueta commented 7 years ago

Sorry about the confusion in my previous comment: rate limiting works for Google maps. (and it would be a trivial code change if we decide it shouldn't, because they have to quota limits on our part).

jorgesancha commented 7 years ago

Thanks @jgoizueta we actually need it to work for Google maps, so great that it already does

jgoizueta commented 7 years ago

Acceptance plan

Note: I'll add some guidelines to apply this checks in on-premises intances later.

Setup

After merging into the development branch:

Tests

Repeat for the non-organization and organization users.

  1. Read default configuration: the configuration should initially be NULL. Try to change de configuration using the user role: it should fail; check the configuration remains NULL. Check no rate limits are applied to geocoding
  2. Set configuration for the user, limit=3, period = 60; first to a different user, and check the user configuration does not change, then to the test user, check configuration which should have changed. Then repeat the following check with Here, Mapzen & Google providers:
    • Check first 3 limits request within a minute are successful
    • Check 4th request (within a minute) fails There's a catch here: if the minute changes as per the server's clock, the request will be accepted; if it is try again (request faster for less chances of this problem).
    • Repeat checks after at least 1 minute (3 request should pass again)
  3. (only for org member user) Set less restrictive per organization config (limit=5), check configuration; Repeat the checks, the 4th request should still fail.
  4. (only for org member) Remove the user configuration (set it to NULL). Repeat the checks, the 5th request should now fail.
  5. Apply more restrictive server config (limit=2), check config, Check it doesn't apply (5th request still failing; or 4th for non-org user)
  6. Remove org config (remove user config for non-org user), check config, check now that the 3rd request if failing as per server config.
  7. Remove server config, check no limits now

How to perform the necessary operations:

Note: to use the Google provider you'll need a client id and secret key. I used a free api key, but I had to patch the python code to use it in the tests:

diff --git a/server/lib/python/cartodb_services/cartodb_services/google/geocoder.py b/server/lib/python/cartodb_services/cartodb_services/google/geocoder.py
index 2a836f3..f1c5ad2 100644
--- a/server/lib/python/cartodb_services/cartodb_services/google/geocoder.py
+++ b/server/lib/python/cartodb_services/cartodb_services/google/geocoder.py
@@ -12,8 +12,7 @@ class GoogleMapsGeocoder:
     def __init__(self, client_id, client_secret, logger):
         self.client_id = self._clean_client_id(client_id)
         self.client_secret = client_secret
-        self.geocoder = googlemaps.Client(
-            client_id=self.client_id, client_secret=self.client_secret)
+        self.geocoder = googlemaps.Client(key=self.client_secret)
         self._logger = logger

     def geocode(self, searchtext, city=None, state=None,
rafatower commented 7 years ago

Check 4th request (within a minute) fails There's a catch here: if the minute changes as per the server's clock, the request will be accepted; if it is try again (request faster for less chances of this problem).

Not 100% sure but I think there's no such catch cause the underlying algorithm uses kind of a "sliding window" starting with the first request within the period :wink:

jgoizueta commented 7 years ago

Acceptance status: The configuration part works well, but the rate limits are not applied to any providers ๐Ÿ˜“

I'm debugging it in staging (locally it worked)

jgoizueta commented 7 years ago

Problem found: I didn't create the migration scripts correctly (some functions are missing) ๐Ÿ˜…

jgoizueta commented 7 years ago

It seems we have a bug: rate limits work well for Mapzen (refactored configuration), but there's a problem with Here (legacy configuration): after it has been rate-limited once, and after the limit period has expired, and error occurs when trying to geocode again:

ERROR:  cdb_dataservices_client._cdb_geocode_street_point(6): [cartodb_staging_user_fef64064-3754-40b4-9032-6f7d6cd95edf_db] REMOTE ERROR: spiexceptions.ExternalRoutineException: NameError: global name 'logger' is not defined

Which is due to a minor bug, now solved; the underlying problem is:

WARNING:  Error trying to geocode street point using here maps
CONTEXT:  PL/Python function "_cdb_here_geocode_street_point"
ERROR:  Exception: Error trying to geocode street point using here maps
CONTEXT:  Traceback (most recent call last):
  PL/Python function "_cdb_here_geocode_street_point", line 24, in <module>
    raise Exception('Error trying to geocode street point using here maps')
PL/Python function "_cdb_here_geocode_street_point"
jgoizueta commented 7 years ago

Problem identified, fixing it... Now, it is critical to pass acceptance for all providers, Mapzen, Here, Google.

jgoizueta commented 7 years ago

After fixing a problem with the migration script (related to public permissions), I've found another acceptance fail: it seems org configuration has precedence over user configuration ๐Ÿ˜ž

(this should have been catched by the unit tests, I'm looking into it)

jgoizueta commented 7 years ago

Ok, false alarm, I had mixed up the db roles of the org owner and org member I was testing ๐Ÿ˜ข

jgoizueta commented 7 years ago

Finally, after detecting and fixing a number of issues, we've accomplished acceptance. ๐ŸŽ‰

jorgesancha commented 7 years ago

๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿปwell done guys On Thu, 30 Mar 2017 at 18:44, Javier Goizueta notifications@github.com wrote:

Finally, after detecting and fixing a number of issues, we've accomplished acceptance. ๐ŸŽ‰

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/dataservices-api/issues/346#issuecomment-290470041, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFQyqB56DxszwPh6B0TXKyh09SvXYz4ks5rq9vbgaJpZM4MP8cb .

jgoizueta commented 7 years ago

This is now in production.

Documentation: https://github.com/CartoDB/dataservices-api/blob/master/doc/rate_limits.md

Note:

The behaviour from Builder when the geocoding rate limit is exceeded is just like when the quota is exceeded: some records will silently left un-geocoded.

This is because when used from Builder, the geocoding functions just ignore any exception., This is done to avoid rolling back a transaction which would dismiss partially geocoded rows after having spent user's credits on them.

So, in summary:

jvillarf commented 7 years ago

@javisantana those rate-limit functions are currently implemeted for geocoding service. Should they work with isolines, routing? Any other service in mind?

@jgoizueta told me that it's really easy to implement, but i wanted to ask for it.

Thanks.

javisantana commented 7 years ago

Yes, limit per user per day is just for geocodings, so no need to implement them for isolines and routing