Closed WadeBarnes closed 8 months ago
A closer look at one of the recent spikes:
The queries during these periods appear to be performing paging to the level we allow. I've adjusted the expression filtering to limit things a bit more to see if that helps.
The changes helped narrow the window:
The load is definitely related to paging queries.
Almost certainly, these issues are coming from a known entity to BC Gov (I’ll leave off the name), who is trying to use OrgBook as a way to maintain s full list of all BC registred entities. They are meeting with BC Registries tomorrow to talk about how to get the information in other ways. Since they want to maintain their database to be as accurate as possible, I’m sure they will continue to work on how to scrape the data.
The IP addresses associated to these particular queries are globally distributed with repeated queries for the same pages, so I'm skeptical that's the source. I've been tracking another query pattern that I suspect is related to what you're talking about, but that query pattern does not use paging, and therefore does not put much load on the search engine. It looks like this:
OrgBook Traffic over the past week:
Filtering out the traffic from the sync above and one other query traffic pattern I'm following our typical week looks like this:
Even when you factor in all of the queries the load on the search engine is very low. The only spikes we are seeing is from the paging queries which make up a tiny fraction of the overall traffic. When used "correctly" the OrgBook could handle a much higher volume of synchronization traffic. There's also the notification webhooks that can be used to subscribe to change notifications.
The adjustments to the expression filtering greatly reduced the duration of the CPU spikes, and you can see this from the 1D view here. It shows up as a reduction in CPU use since the data is down sampled.
Zooming in you can see this is just a reduction in the duration:
The traffic over this period consisted of 167 requests all but one set of a few queries (which looked to be a legitimate query) were paging queries from globally distributed IPs.
Interestingly, the filtering updates appear to have detoured some of the unwanted traffic. Though we've seen this traffic drop off and pick up again before, the reduction correlates with the updates to the expression filtering quite well.
Additional query patterns have been identified and added to the expression filtering.
The updates to the expression filters got things under control.
Starting January 3rd 2024 - 4:20pm we started to experience CPU spikes to 60% every four hours. These spikes do not appear to be related to query volume, rather a specific query pattern which is different than the known and blocked one.
These spikes are continuing to occur on a periodic basis.
Note the level of the spikes is reduced due to the down sampling of the wider time scale in this graph:
Investigate and identify the query pattern responsible for these spikes and determine if it's a pattern we should be blocking. For the moment this activity is not adversely affecting performance or service levels.