AWSbot / PHPscript

3 stars 2 forks source link

Excessive number of Overpass API requests for addr:postcode=1533. FIX REQUIRED 🔧⚡️ #1

Closed mmd-osm closed 6 years ago

mmd-osm commented 7 years ago

For some reason PHPscript appears to trigger around 15000-25000 identical request on each day for a particular postcode:

[out:json];node["osak:identifier"]["addr:postcode"=1533];out;

I guess this is not really intended, as this query does not return any data and all of the other postcode queries don't exhibit the same behavior.

Also, I'd suggest to add some wait time in case of error rather than continuously sending the same query again. A HTTP User-Agent would also be helpful to better identify the source of those queries.

Thanks!

mikini commented 7 years ago

Hi mmd.

I did look a bit into this when I stumbled upon it in late September, but thought it might have been just intermittent as there were silence since, and I ran out of time. Sorry if the AWSbot interferes with Overpass that is surely not intended!

Do you still see hits of that magnitude?

The AWSbot scripts doesn't seem like they are very well maintained (or mature). I haven't been involved directly but I have voiced in on different issues regarding the import's procedure, documentation and validation (1, 2) which was discussed on the talk-dk mailing list last year.

I don't know the details of the history but according to a post in one of the threads mentioned Stephen Møller is now in charge of running the import to OSM in the context of the OSM user AWSbot, which was originally done by Peter Brodersen. As he mentions himself the user also got blocked by the DWG on two occasions that summer because of edits that were completely off the map.

As I understood the procedure then from some other posts with further details (and what I saw when the scripts became available) it is a fully manual process that he triggers from a website when time permits ("Q: Hvor ofte sker det? A: Nå jeg har tid." = "Q: How often? A: When I got time.").

Studying the non-output scripts a bit, however, reveals that they use argv (1,2) intermixed with HTML output which suggest that they do run on the command line. But no details about the context in which they execute was disclosed so it's hard to say for sure if they are being launched periodically by a cron job. Have you analysed where the traffic actually originates? Are we sure it is a single source ip?

I didn't pursue improving the setup any further back then as it felt already a bit like fighting windmills and the most serious problems seemed to be fixed. I can take a look at the scripts and do some sanitizing but whether we can reach the people causing the load and convincing them to change I don't know.

I have found contact info on Stephen and I'll ping him now and probably give him a call one of the following days so we can find out if his setup do trigger this excessive load on Overpass. Hopefully we can coordinate a quick remedy then, assuming he is the source.

Regards, Mikkel

mikini commented 7 years ago

I have had an email conversation with Stephen who immediately on becoming aware of the problem this morning brought down Apache completely on the machine where he is running AWSbot on behalf of the OSM-DK community. He did this to mitigate any risk of others triggering the problem without his knowledge (I can't see that could be possible using the scripts here non Github, though).

I've also posted a pledge in the Danish Talk-dk mailinglist for people to stop using the scripts for the time being. Have you seen a decrease in the load after these attempted remedies?

Stephen notes that he hasn't performed any imports (it is not a very automated process) since summer which coincides with the activity of the AWSbot user. I have asked him if he could assure that no php processes are running the scripts from the command line on the machine (which seems to be the intended use). I've found a disturbing infinite loop being triggered if the scripts are asked to fetch the 1533 postcode (or anything else which results in an empty dataset from DAWA). This must surely also affect the DAWA service as it is looping through first an Overpass and then the DAWA request continually without throttling. I'll be fixing this in my fork where I'm tracking that particular issue in https://github.com/mikini/AWSbot/issues/5.

I reckon it would be due diligence to actually determine whether the requests did originate from Stephen's machine or if others are involved. How can we do that? Are you in a position to disclose the IP/IP's that misbehaved? I found the Overpass Munin installation but it hasn't got any explicit API measurements for me to confirm whether the load has decreased.

Hjart commented 7 years ago

Apart from AWSbot I have seen 2 users work with danish adresses, namely https://www.openstreetmap.org/user/J%C3%B8rn-osm who has done considerable work "cleaning up" adresses, removing duplicates and moving misplaced adresses back etc. and a newbie who used the AWSbot scripts for a oneshot job importing adresses for a new residential area (and expressed surprise over finding a "goto" in the scripts) I don't know to which extent Jørn-osm is using the AWSbot scripts, but it might be worth asking him.

mmd-osm commented 7 years ago

Thanks for looking into this. Checking yesterday'a log files on overpass-api.de there's 13025 requests for 1533 postcode, all originating from a US-CA ipv6 address ending on "::2" (sorry can't post more details for obvious privacy reasons). Requests from this static address seem to have started some time around March/April 2017. Requests neither provide a referrer nor a user agent.

I'll recheck figures in the next couple of days...

  13025 [out:json];node["osak:identifier"]["addr:postcode"=1533];out;
     11 [out:json];node["osak:identifier"]["addr:postcode"=1550];out;
      4 [out:json];node["osak:identifier"]["addr:postcode"=1532];out;
      3 [out:json];node["osak:identifier"]["addr:postcode"=1000];out;
      3 [out:json];node["osak:identifier"]["addr:postcode"=1050];out;
      3 [out:json];node["osak:identifier"]["addr:postcode"=];out;
      2 [out:json];node["osak:identifier"]["addr:postcode"=1165];out;
      2 [out:json];node["osak:identifier"]["addr:postcode"=1259];out;
      2 [out:json];node["osak:identifier"]["addr:postcode"=1263];out;
      2 [out:json];node["osak:identifier"]["addr:postcode"=1307];out;

I found the Overpass Munin installation but it hasn't got any explicit API measurements for me to confirm whether the load has decreased.

Right, it seems like the requests originate from someone else running those scripts. If there's no way to figure out more details, there's still the last resort of just blocking that ip address.

mikini commented 7 years ago

@mmd-osm Well, somebody needs to act when the internet breaks... Thank you for providing some elaboration, that'll be an average of 9 req/min ~ 7 sec. between each. A quick test right now from my laptop for some minutes with crude measurements added to the script got me Average : 3.7971430049752, but I start to see occasional 429 Too Many Requests status codes from Overpass so that's probably speeding things a bit up. I think a couple of those requests must be from my testing also (the "addr:postcode"= returned 2GiB before being killed by PHP memory limit).

I've also had conversations about that IP in private and public with the SDFE agency running DAWA and I got a hunch from the PTR record, but are waiting for confirmation.

But thanks for confirming that Overpass are still seeing requests, we'll need to dig a bit deeper then. Nice, however, that it is only a single IP that is responsible. At DAWA they got Cloudfront in front of them, so they seemed not that worried. Doesn't Overpass employ any kind of caching?

@Hjart Ok, thanks. The AWSbot scripts are rather hardcoded for generating added/changed nodes compared with DAWA, and are not at all modular or adaptable (unless you know PHP and the API data models intimately) so I don't think it is very likely they are being used for any practical work in OSM. But if you think we should ping some of them outside of Talk-dk you are of course welcome to do that (I can't personally take on more work right now I'm afraid). The guy looking at AWSbot wasn't interested in improving it?

mmd-osm commented 6 years ago

Solved by different server setup in the meantime, hence closing.

mikini commented 6 years ago

@mmd-osm I was of the impression, although mostly by the silence in here, that the requests stopped at the time of our investigation. Isn't that true?

Does "different server setup" refer to changes at Overpass or the changes to the Danish importer AWSbot->AutoAWS?

mmd-osm commented 6 years ago

Another productive Overpass server was added end of 2017, and the issue is no longer urgent in the meantime. I haven't really checked if the bot was also fixed.

Hjart commented 6 years ago

As far as I know "the bot" aka AWSbot was never fixed and was replaced by a different and much more succesful bot "AutoAWS" earlier this year.

mikini commented 6 years ago

You must be thinking about the osm-dk community support when saying "replaced". The AWSbot code of this repository could just as well still be running somewhere and if that makes sense somehow that would be fine. However, I am of course concerned if valuable community ressources at Overpass (and also because I'm a Danish taxpayer at DAWA/DAR) are being wasted for nothing because of fixable bugs. I would like to help analyze and mitigate this situation if I am able to.

I assumed that Stephen had killed his bad behaving AWSbot client after this conversation, and the conversation I had with him. Thus I was surprised when mmd mentioned some production change as reason to close this issue. If Overpass still receive excessive requests with the AWSbot postcode query from a single source IP my suggestion would be to block that IP. This shouldn't affect the current community endorsed import of Danish addresses to OSM which is now done using autoAWS from a different server (as far as I'm aware).

Regarding the AWSbot code, and its missing "fixing". I haven't seen any indications of anybody else than me working on it either. I did some experimental changes in my fork (maybe I never got to pushing them, I don't recall atm.); adding user agent header, throttling infinite retry, handling empty datasets from upstream with a little sanity. However, it was obviously in need of a massive overhaul to ever get into a sensible state. Luckily JKHougaard (autoAWS author) stepped in and did a better job at getting things done than I.