Open brnt opened 1 year ago
Follow-up question: Could this high latency be some sort of grey-list throttling on the part of api.census.gov? The reason I ask is that my IP address appears to have been blacklisted. I don't get any kind of definitive message from the server, but connecting through a VPN works (and works at full speed), while the remote endpoint disconnects violently when I connect directly.
For the record, I haven't been pounding the API at all. I've rarely touched it since reporting this latency two weeks ago. Not sure how this IP address might have gotten blacklisted. And my API key still works from other IP addresses.
Hi, sorry for the delay in replying.
I haven't been able to replicate this myself, but I have heard of other users having issues with really high latency using the service. I have reached out to the USCB folks to see if there's any change in the policy that @cenpy-devs missed, but I haven't seen anything or gotten a response.
It's entirely possible that the rate limiting is IP specific. Are you accessing it from a shared endpoint?
I've got a few ideas on how to make any greylisting less likely, and we're working to spec a Google summer of code project with this. Mainly, we hope to begin using requests.Session() objects using a cenpy-specific user agent string, rather than making ad hoc requests directly.
Hi, sorry for the delay in replying.
No problem at all. Thanks for taking a minute to respond.
I haven't been able to replicate this myself, but I have heard of other users having issues with really high latency using the service. I have reached out to the USCB folks to see if there's any change in the policy that @cenpy-devs missed, but I haven't seen anything or gotten a response.
It's entirely possible that the rate limiting is IP specific. Are you accessing it from a shared endpoint?
Not a shared endpoint. If the issue is indeed rate limiting on the USCB side (also my current best guess), then either (1) the query threshold must be extremely low; or (2) cenpy could be making dozens of requests per conceptual query. I haven't dived into the cenpy code to verify, but it could be doing repeated queries for tract-level stats across a city or something. You'll know better than I would.
I've got a few ideas on how to make any greylisting less likely, and we're working to spec a Google summer of code project with this. Mainly, we hope to begin using requests.Session() objects using a cenpy-specific user agent string, rather than making ad hoc requests directly.
It might also be helpful to communicate back to the USCB folks that an email warning when traffic exceeds some threshold would be extremely helpful. They have the email addresses of anyone that has signed up for and verified an API key.
Thanks for your help! I'll keep using the VPN for now, and I'll watch here for any new info.
it could be doing repeated queries for one conceptual query
Indeed, this is done. When we wrote the package, there was a 50-column limit on individual queries. So, queries for tons of columns get split into columnar chunks and are put back together at the end. However, this scales linearly over columns, and rarely caused issues before.
I'll update here with any changes from USCB.
I pulled out a project I hadn't used in a month or two and found that cenpy now introduces a huge amount of latency. Importing the module does it. Instantiating a
cenpy.products.ACS
object does it again.To narrow in on just the import statement:
Note that the import alone takes 1 min 42 sec (!). I've tried it both with and without an API key (including deleting
SITEKEY.txt
). Also note that this was the second run of the same command, just in case there was some module compilation happening the first time.I've also double-checked that it's not a network problem. Pings look normal (sample of output during import test above):
I realize that this may be an issue with the census.gov servers, rather than cenpy. It may also be an issue with MacOS (see upgrade note below).
Potentially relevant info: