FirstStreet / fsf_api_access_python

First Street API Access (Python)
https://firststreet.org/
MIT License
13 stars 7 forks source link

Ensure api requests adhere to 5000 / hr limit #29

Open abpoll opened 3 years ago

abpoll commented 3 years ago

Question: How can I adhere to the 5000 / hr limit with an api request?

I've tried implementing the ratelimit (https://pypi.org/project/ratelimit/) and backoff (https://pypi.org/project/backoff/) packages but get a 429 error.

Here is what the function that calls from the api looks like:

@on_exception(expo, RateLimitException, max_tries=10)
@limits(calls=5000, period=ONE_MINUTE)
def call_fs(fsids, api):
    fs = firststreet.FirstStreet(api)
    depths = fs.probability.get_depth(fsids)

    return depths

I also tried setting calls=500 but the 429 error still is thrown.

I'm guessing the limitation is not being implemented in the call.

Do I have to break out fsids into chunks of <= 5000, get them in fs.probability.get_depths(fsids_5000), time the call, and ensure that a minute passes before the next call?

Lyetenth commented 3 years ago

Hey abpoll,

I added a asyncio throttle functionality in the new release (2.1.0).

To get it to work, specify the number of calls and the period in the firststreet object initialization

Ex 1 - (1 call per 1 second): fs = firststreet.FirstStreet(api_key, rate_limit=1, rate_period=1)

Ex 2 - (5000 calls per hour): fs = firststreet.FirstStreet(api_key, rate_limit=5000, rate_period=3600)

Let me know if there's any issues

abpoll commented 3 years ago

Thanks!

abpoll commented 3 years ago

Will 2.1.0 be available through pip install soon?

Lyetenth commented 3 years ago

Hey abpoll,

The 2.1.0 should already be up on PyPi: https://pypi.org/project/fsf-api-access-python/#description

I did notice in a previous issue (#28), that you are on Python 3.6. Due to a change in how we handled the asynchronous call returns, we needed to move to using 3.7 and had to drop support for 3.6. You will need to update your Python version to 3.7/3.8 to get the version 2 updates

abpoll commented 3 years ago

Thanks very much!

On Oct 2, 2020, at 11:37 AM, Kelvin notifications@github.com wrote:

 Hey abpoll,

The 2.1.0 should already be up on PyPi: https://pypi.org/project/fsf-api-access-python/#description

I did notice in a previous issue (#28), that you are on Python 3.6. Due to a change in how we handled the asynchronous call returns, we needed to move to using 3.7 and had to drop support for 3.6. You will need to update your Python version to 3.7/3.8 to get the version 2 updates

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

abpoll commented 3 years ago

I tested the new functionality for county_fips 25017 (~400k parcels). I set the limit to 4000 per minute and the throttle function stably avoids Error 429 for 8 hours of runtime. An issue is that Timeout Errors occur constantly so that what hypothetically could take 2 hours doesn't finish. At around the 8 hour mark, a 429 Rate Limit exceeded error was thrown.

The call to the api is fs = firststreet.FirstStreet(api_key, rate_limit=4000, rate_period=60). Is that correct?

Here is the log until the first Timeout Error was thrown

0%| | 0/420263 [00:00<?, ?it/s] 0%| | 1/420263 [00:54<6398:25:20, 54.81s/it] 0%| | 2/420263 [01:01<4720:39:51, 40.44s/it] 0%| | 98/420263 [01:05<3304:56:47, 28.32s/it] 0%| | 102/420263 [01:12<2375:53:43, 20.36s/it] 0%| | 197/420263 [01:16<1664:16:14, 14.26s/it] 0%| | 202/420263 [01:19<1184:29:58, 10.15s/it] 0%| | 297/420263 [01:30<832:59:36, 7.14s/it] 0%| | 302/420263 [01:39<650:36:35, 5.58s/it] 0%| | 307/420263 [01:50<528:16:35, 4.53s/it] 0%| | 312/420263 [02:00<440:26:38, 3.78s/it] 0%| | 318/420263 [02:07<349:20:55, 2.99s/it] 0%| | 412/420263 [02:10<245:43:03, 2.11s/it] 0%| | 418/420263 [02:14<193:46:37, 1.66s/it] 0%| | 506/420263 [02:17<137:00:02, 1.17s/it] 0%| | 512/420263 [02:27<152:40:36, 1.31s/it] 0%| | 518/420263 [02:37<167:09:23, 1.43s/it] 0%| | 524/420263 [02:48<176:46:41, 1.52s/it] 0%| | 530/420263 [02:54<163:14:14, 1.40s/it] 0%| | 531/420263 [02:58<235:51:51, 2.02s/it] 0%| | 537/420263 [03:01<184:25:59, 1.58s/it] 0%| | 630/420263 [03:05<130:23:24, 1.12s/it] 0%| | 631/420263 [03:08<214:28:39, 1.84s/it] 0%| | 637/420263 [03:12<169:20:06, 1.45s/it] 0%| | 730/420263 [03:15<119:55:06, 1.03s/it] 0%| | 737/420263 [03:18<99:42:18, 1.17it/s] 0%| | 823/420263 [03:22<71:17:13, 1.63it/s] 0%| | 830/420263 [03:29<84:14:39, 1.38it/s] 0%| | 836/420263 [03:35<92:08:19, 1.26it/s] 0%| | 843/420263 [03:42<98:48:03, 1.18it/s] 0%| | 849/420263 [03:42<73:54:43, 1.58it/s] 0%| | 856/420263 [03:49<86:01:57, 1.35it/s] 0%| | 862/420263 [03:53<80:15:13, 1.45it/s] 0%| | 869/420263 [03:56<72:55:18, 1.60it/s] 0%| | 870/420263 [04:00<171:58:26, 1.48s/it] 0%| | 957/420263 [04:03<121:42:30, 1.04s/it] 0%| | 964/420263 [04:06<102:34:43, 1.14it/s] 0%| | 970/420263 [04:10<91:36:10, 1.27it/s] 0%| | 1056/420263 [04:14<65:38:57, 1.77it/s] 0%| | 1064/420263 [04:17<61:04:40, 1.91it/s] 0%| | 1156/420263 [04:24<45:20:51, 2.57it/s] 0%| | 1170/420263 [04:30<46:59:24, 2.48it/s] 0%| | 1184/420263 [04:38<52:02:43, 2.24it/s] 0%| | 1198/420263 [04:48<62:24:33, 1.87it/s] 0%| | 1212/420263 [04:58<69:16:21, 1.68it/s] 0%| | 1227/420263 [05:05<64:25:33, 1.81it/s]2020-10-02 16:05:52,941 root INFO Timeout error for item: 250955009 at https://api.firststreet.org/v1/probability/depth/property/250955009?None. Retry 0

Here is the log at the bottom before the 429 error was called

94%|█████████▎| 393061/420263 [8:14:02<02:12, 204.90it/s] 94%|█████████▎| 393095/420263 [8:14:02<02:16, 199.63it/s] 94%|█████████▎| 393141/420263 [8:14:02<02:03, 220.36it/s] 94%|█████████▎| 393170/420263 [8:14:02<02:15, 199.67it/s] 94%|█████████▎| 393211/420263 [8:14:03<02:09, 208.12it/s] 94%|█████████▎| 393248/420263 [8:14:03<02:05, 214.84it/s] 94%|█████████▎| 393284/420263 [8:14:03<02:08, 210.38it/s] 94%|█████████▎| 393332/420263 [8:14:03<01:59, 225.50it/s] 94%|█████████▎| 393356/420263 [8:14:08<27:28, 16.32it/s] 94%|█████████▎| 393373/420263 [8:14:08<20:33, 21.79it/s] 94%|█████████▎| 393396/420263 [8:14:08<15:21, 29.17it/s] 94%|█████████▎| 393475/420263 [8:14:08<11:00, 40.53it/s] 94%|█████████▎| 393490/420263 [8:14:08<33:37, 13.27it/s] Traceback (most recent call last): File "/restricted/project/places/code/user/abpoll/places/fs_process.py", line 25, in depths = fs.probability.get_depth(fsids) File "/restricted/project/places/code/user/abpoll/.conda/envs/fs_download/lib/python3.8/site-packages/firststreet/api/probability.py", line 155, in get_depth api_datas = self.call_api(search_item, "probability", "depth", "property", extra_param=extra_param) File "/restricted/project/places/code/user/abpoll/.conda/envs/fs_download/lib/python3.8/site-packages/firststreet/api/api.py", line 114, in call_api response = loop.run_until_complete(self._http.endpoint_execute(endpoints)) File "/restricted/project/places/code/user/abpoll/.conda/envs/fs_download/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/restricted/project/places/code/user/abpoll/.conda/envs/fs_download/lib/python3.8/site-packages/firststreet/http_util.py", line 70, in endpoint_execute await t File "/restricted/project/places/code/user/abpoll/.conda/envs/fs_download/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one return f.result() # May raise f.exception(). File "/restricted/project/places/code/user/abpoll/.conda/envs/fs_download/lib/python3.8/site-packages/firststreet/http_util.py", line 123, in execute raise self._network_error(self.options, rate_limit, error=body.get('error')) firststreet.errors.RateLimitError: Network Error 429: Rate limit exceeded. Limit: 5000. Remaining: 0. Reset: 5

Lyetenth commented 3 years ago

Hey @abpoll,

So for the two errors:

The timeout error you're receiving looks like it's from the probability products. The next update to the First Street API (v1.2) should include an update to the probability products, making them respond faster and fix the timeout errors from the server.

The second error I'm having a hard time reproducing. rate_limit=4000, rate_period=60 is correct, it will run 4000 connections every 60 seconds. Just to check, you want 4000 / 60 seconds, and not 5000 / hour right?

abpoll commented 3 years ago

Hi @Lyetenth - thanks for the help. Sorry for the confusing title of this issue. The specific call I made was to get 4000 / 60 seconds (trying to avoid hitting that 5000 limit).

Looking forward to the update. I'll wait for v1.2 before making another big request like at the county level.

bradleyswilson commented 3 years ago

@Lyetenth Joining this discussion as I'm facing the same errors when trying to make larger calls.

Is there a recommended solution for big requests with FSIDs? Is there a rate limit that will definitely avoid timeout issues?

Lyetenth commented 3 years ago

Hey @bradleyswilson,

Just to confirm: Are these issues only with the probability product? Are the errors returned only ratelimit errors (timeout errors sometimes occur, but it should automatically retry 3 times)? About how large is your FSID file (number of FSIDs)?

And what value are you providing to the rate_limit and rate_period arguments?

Also what version of the library are you currently on? There's been a few updates since this issue was closed, so it might be a new issue I need to take a look at

-- I believe the 1.2 release is slated to be mid-December, so hopefully any performance issues with the probability calls should be fixed soon

bradleyswilson commented 3 years ago

@Lyetenth Yes, I've only been working with the probability product. My situation is very similar to the previous one (i.e. timeout errors that were retrying successfully, but then getting a 429 ratelimit error at some point during the run)

I've played around with rate limit and rate period. rate_limit=4000, rate_period=60 were my initial specs, but upping the rate_period to 120 seems to have fixed the problem for some calls, but not others.

I have about ~200 unique lists of FSIDs for portions of major US cities. They range from 2,400-760,000 FSIDs.

I believe it's version 2.1. - downloaded on October 6th. I can upgrade to the most recent if that would make a difference?

Lyetenth commented 3 years ago

Hey @bradleyswilson, try updating to the most recent version (2.2.1)

Since version 2.1.1, the number of timeouts should be reduced significantly on large lists of FSIDs (100k+ FSIDs). When I ran a test on 500k lat/lngs after the 2.1.1 change, I think only 5 timeouts occurred and they were successfully retried compared to the many before

Assuming you're getting a lot of timeouts (more than the 5 I got. Likely tens or hundreds), then my thoughts on why this might fix it is that during a timeout, it attempts to re-call the API, which will count as an additional call for your API key. When too many timeouts happen, that's when you get a rateLimit error even though it's explicitly set in the FirstStreet object.

Just in case the 5 timeouts are unlucky and it trips the rateLimit, I do use a slightly lower limit of 4990 per 60 period

Try the update, and let me know if it is successful!

bradleyswilson commented 3 years ago

@Lyetenth No luck, still getting tons of timeouts on large lists, eventually triggering a 429. Splitting into a smaller batch of 200k doesn't seem to reduce the number of timeout errors either, although it hasn't thrown a rateLimit error yet (still running).

If the new updates are coming soon, I can probably wait to do more testing then, or work around by batching the files into smaller lists. Most of the cities I'm working with are much smaller in size.

Lyetenth commented 3 years ago

@bradleyswilson, that's weird hmmm.

I'm currently running a test on 506,880 FSIDs in a file

With my script:

fs = firststreet.FirstStreet(api_key, connection_limit=100, rate_limit=4000, rate_period=60, log=True)
file = "test_files/sample_property.txt"
fs.probability.get_depth(file, csv=True)

And I haven't had a timeout error yet. If you have your script, could you post it as well? Or try what I have above (replacing the API key and file of course). Are you also using get_depth, or another sub-product? Let me know if you're using another sub-product and I'll test a large batch on that one as well.

--

My concern is that I don't think these are server-side issues, and are client-side. As a result, even with the 1.2 update to the API, the bug would persist

bradleyswilson commented 3 years ago

Was running from command line, but script is basically identical. I'll test it through the script though too. Using get_depth

I can send/post my list of FSIDs somewhere if that would be helpful on trying on your end?

Lyetenth commented 3 years ago

Okay, I just started a run with 506,880 FSIDs on

python -m firststreet -p probability.get_depth -s .\test_files\sample_property.txt --connection_limit 100 -rate_limit 5000 -rate_period 60

Thus far it's looking okay. Do you have an estimate on when the timeouts begin to occur?

I'll take the list of FSIDs as well to see if it might be specific properties that may have an issue. If you can post it here: https://drive.google.com/drive/folders/1Eaa1aux_5PW0jNXLxOFKWsbUx_yMPZXy?usp=sharing

bradleyswilson commented 3 years ago

Just dropped the file that I've been testing with in there. It starts throwing timeouts really early (like 1-2% for me).

Lyetenth commented 3 years ago

I've been able to process 44982/759682 thus far without any timeout errors using the commandline from above (ie: python -m firststreet -p probability.get_depth -s .\CALosAngeles1939.txt --connection_limit 100 -rate_limit 5000 -rate_period 60 )

The next thing I can think of is checking the versions of the minor libraries

Can you post a screenshot of running pip list on your console? I'll see if any of the other libraries are different. Also what version is your Python?

bradleyswilson commented 3 years ago

@Lyetenth Python 3.7.4

Screen Shot 2020-12-09 at 8 44 53 AM
Lyetenth commented 3 years ago

The 759682 properties did get successfully pulled with no rateLimit errors. There were a bunch of timeOut errors, but only around the 600k mark. One thing to caution is that with 600k properties, the CSV generation may take a long time

Re: libraries, the only difference I see in the libraries is idna == 2.10 for me, and six == 1.15.0. But neither of those look too relevant (not a library I directly import)

I'm using Python 3.7.9, but it's not a major version difference so I'm not sure how big of an impact on performance that is. You can try updating your Python to the newest version to see if that may help.

I do think the many timeOut errors are causing the rateLimit crash. You can try reducing the number of concurrent connections by changing the connection_limit value and see if that may help (less concurrent connections, less waiting connections to be processed, hopefully less timeouts)

ie: python -m firststreet -p probability.get_depth -s .\CALosAngeles1939.txt --connection_limit 50 -rate_limit 4990 -rate_period 60

-- I also just checked and it looks like the targeted date to go live with version 1.2 is January 4th, 2021

bradleyswilson commented 3 years ago

Weird, I get timeout errors before it even gets to 1% completion.

Thanks for trying to troubleshoot with me, I'll keep playing around with different settings and see if I can get something to work, otherwise I'll hope for v1.2 or another team member having some luck.