Closed benwbrum closed 1 year ago
We have not had a single complaint about the lack of subject-seeded text search.
This just happened again, causing another brief outage.
Log entry shows paged search on subjects:
I, [2022-11-08T13:46:34.728854 #32577] INFO -- : Started HEAD "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 217.182.175.162 at 2022-11-08 13:46:34 +0000
I, [2022-11-08T13:46:34.730909 #32577] INFO -- : Processing by DisplayController#paged_search as HTML
I, [2022-11-08T13:46:34.731138 #32577] INFO -- : Parameters: {"article_id"=>"71951", "authenticity_token"=>"s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm/W5RHtWd1Rlvz/rB/287X2njAWd+MALcLnikl3Lzpxf6pZNbGWAQ==", "collection_id"=>"cwrgm-public", "unlinked_only"=>"true"}
I, [2022-11-08T13:46:34.778929 #32577] INFO -- : Rendering display/paged_search.html.slim within layouts/application
I, [2022-11-08T13:46:34.787304 #32577] INFO -- : Rendered shared/_collection_tabs.html.slim (Duration: 5.1ms | Allocations: 2336)
I, [2022-11-08T13:49:00.084928 #32577] INFO -- : Rendered display/_pages_view.html.slim (Duration: 145296.6ms | Allocations: 1505750)
I, [2022-11-08T13:51:05.692721 #32577] INFO -- : Rendered shared/_pagination.html.slim (Duration: 125606.4ms | Allocations: 14793)
I, [2022-11-08T13:51:05.724237 #32577] INFO -- : Rendered article/_article_links.html.slim (Duration: 23.8ms | Allocations: 2580)
I, [2022-11-08T13:51:05.758603 #32577] INFO -- : Rendered display/_multi_page.html.slim (Duration: 270970.4ms | Allocations: 1542256)
I, [2022-11-08T13:51:05.760127 #32577] INFO -- : Rendered shared/_collection_footer.html.slim (Duration: 0.3ms | Allocations: 52)
I, [2022-11-08T13:51:05.760991 #32577] INFO -- : Rendered display/paged_search.html.slim within layouts/application (Duration: 270981.4ms | Allocations: 1547592)
I, [2022-11-08T13:51:05.763162 #32577] INFO -- : Rendered layouts/_mixpanel.erb (Duration: 0.2ms | Allocations: 50)
I, [2022-11-08T13:51:05.764140 #32577] INFO -- : Rendered layouts/_fb_pixel.erb (Duration: 0.1ms | Allocations: 50)
I, [2022-11-08T13:51:05.770331 #32577] INFO -- : Rendered shared/_breadcrumbs.html.slim (Duration: 0.5ms | Allocations: 99)
I, [2022-11-08T13:51:05.773808 #32577] INFO -- : Rendered layouts/_ga.erb (Duration: 0.2ms | Allocations: 51)
I, [2022-11-08T13:51:05.785532 #32577] INFO -- : Completed 200 OK in 271054ms (Views: 1188.2ms | ActiveRecord: 269834.1ms | Allocations: 1567187)
I, [2022-11-08T13:51:05.786229 #32577] INFO -- : Oink Action: display#paged_search
I, [2022-11-08T13:51:05.786554 #32577] INFO -- : Memory usage: 1348524 | PID: 32577
I, [2022-11-08T13:51:05.786881 #32577] INFO -- : Instantiation Breakdown: Total: 276 | PageArticleLink: 70 | PageVersion: 70 | User: 31 | Page: 31 | Work: 30 | ScCanvas: 30 | Category: 8 | Collection: 2 | Visit: 1 | Article: 1 | DocumentSet: 1 | Ahoy::Event: 1
I, [2022-11-08T13:51:05.787209 #32577] INFO -- : Oink Log Entry Complete
How did a user get to this functionality since we disabled it in the view?
The landing page of the visit appears to be a search on CWRGM, which doesn't make any sense for a human:
{"id"=>22392200,
"visit_token"=>"7f976686-f676-424c-bbf3-3c6fa43e8536",
"visitor_token"=>"e73afdbd-59d1-4b4a-8b79-27ce315ba934",
"ip"=>"217.182.175.162",
"user_agent"=>
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36",
"referrer"=>nil,
"landing_page"=>
"https://fromthepage.com/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true",
"user_id"=>nil,
"referring_domain"=>nil,
"search_keyword"=>nil,
"browser"=>"Chrome",
"os"=>"Mac OS X",
"device_type"=>"Desktop",
"screen_height"=>nil,
"screen_width"=>nil,
"country"=>nil,
"region"=>nil,
"city"=>nil,
"postal_code"=>nil,
"latitude"=>nil,
"longitude"=>nil,
"utm_source"=>nil,
"utm_medium"=>nil,
"utm_term"=>nil,
"utm_content"=>nil,
"utm_campaign"=>nil,
"started_at"=>Tue, 08 Nov 2022 13:46:34 UTC +00:00}
The visit only has a single action, and no user_id
associated with it, so this definitely seems like a crawler
#<Ahoy::Event:0x00000000082d11f0
id: 57781829,
visit_id: 22392200,
user_id: nil,
name: "display#paged_search",
properties:
{"document_set_id"=>872,
"document_set_title"=>
"Civil War & Reconstruction Governors of Mississippi Project",
"collection_id"=>981,
"collection_title"=>
"Civil War & Reconstruction Governors of Mississippi, Main Collection",
"article_id"=>71951,
"article_title"=>"Chalmers, James R. (James Ronald), 1831-1898"},
time: Tue, 08 Nov 2022 13:51:05 UTC +00:00>
Searching for other instances of this, we see 34 hits to the disabled functionality in the last 8 hours. Astonishingly, the problem hits aren't even GET requests, they're HEAD:
$ grep unlinked_only production.log
I, [2022-11-08T06:55:24.234000 #6640] INFO -- : Started GET "/display/search?article_id=76&unlinked_only=true" for 44.228.64.221 at 2022-11-08 06:55:24 +0000
I, [2022-11-08T06:55:24.238109 #6640] INFO -- : Parameters: {"article_id"=>"76", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T06:57:32.041636 #6640] INFO -- : Started GET "/display/search?article_id=32&page=9&unlinked_only=true" for 44.228.64.221 at 2022-11-08 06:57:32 +0000
I, [2022-11-08T06:57:32.044774 #6640] INFO -- : Parameters: {"article_id"=>"32", "page"=>"9", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T07:33:53.176967 #11853] INFO -- : Started GET "/display/search?article_id=13837&unlinked_only=true" for 44.228.64.221 at 2022-11-08 07:33:53 +0000
I, [2022-11-08T07:33:53.179763 #11853] INFO -- : Parameters: {"article_id"=>"13837", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T08:24:56.919039 #18708] INFO -- : Started GET "/display/search?article_id=13258&page=7&unlinked_only=true" for 66.249.66.86 at 2022-11-08 08:24:56 +0000
I, [2022-11-08T08:24:56.921929 #18708] INFO -- : Parameters: {"article_id"=>"13258", "page"=>"7", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
E, [2022-11-08T08:24:56.926083 #18708] ERROR -- : Bad record ID exception for params=<ActionController::Parameters {"article_id"=>"13258", "page"=>"7", "unlinked_only"=>"true", "controller"=>"collection", "action"=>"show", "user_slug"=>"display", "id"=>"search"} permitted: false>
I, [2022-11-08T09:37:53.688331 #28406] INFO -- : Started GET "/display/search?article_id=9&unlinked_only=true" for 44.228.64.221 at 2022-11-08 09:37:53 +0000
I, [2022-11-08T09:37:53.690781 #28406] INFO -- : Parameters: {"article_id"=>"9", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T09:40:12.097723 #29841] INFO -- : Started GET "/display/search?article_id=18577&unlinked_only=true" for 207.46.13.129 at 2022-11-08 09:40:12 +0000
I, [2022-11-08T09:40:12.100363 #29841] INFO -- : Parameters: {"article_id"=>"18577", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
E, [2022-11-08T09:40:12.104124 #29841] ERROR -- : Bad record ID exception for params=<ActionController::Parameters {"article_id"=>"18577", "unlinked_only"=>"true", "controller"=>"collection", "action"=>"show", "user_slug"=>"display", "id"=>"search"} permitted: false>
I, [2022-11-08T10:09:16.981326 #32612] INFO -- : Started GET "/display/search?article_id=11791&unlinked_only=true" for 44.228.64.221 at 2022-11-08 10:09:16 +0000
I, [2022-11-08T10:09:16.984608 #32612] INFO -- : Parameters: {"article_id"=>"11791", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T10:09:59.909035 #32652] INFO -- : Started GET "/display/search?article_id=14396&unlinked_only=true" for 44.228.64.221 at 2022-11-08 10:09:59 +0000
I, [2022-11-08T10:09:59.911727 #32652] INFO -- : Parameters: {"article_id"=>"14396", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T11:11:15.916505 #10011] INFO -- : Started GET "/display/search?article_id=7563&unlinked_only=true" for 44.228.64.221 at 2022-11-08 11:11:15 +0000
I, [2022-11-08T11:11:15.919066 #10011] INFO -- : Parameters: {"article_id"=>"7563", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T11:27:00.697764 #11586] INFO -- : Started GET "/display/search?article_id=7242&unlinked_only=true" for 44.228.64.221 at 2022-11-08 11:27:00 +0000
I, [2022-11-08T11:27:00.700540 #11586] INFO -- : Parameters: {"article_id"=>"7242", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T11:27:46.766577 #11608] INFO -- : Started GET "/display/search?article_id=12980&unlinked_only=true" for 44.228.64.221 at 2022-11-08 11:27:46 +0000
I, [2022-11-08T11:27:46.769565 #11608] INFO -- : Parameters: {"article_id"=>"12980", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T12:55:20.518001 #24703] INFO -- : Started GET "/display/search?article_id=9379&unlinked_only=true" for 44.228.64.221 at 2022-11-08 12:55:20 +0000
I, [2022-11-08T12:55:20.520503 #24703] INFO -- : Parameters: {"article_id"=>"9379", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T13:13:27.340040 #28010] INFO -- : Started GET "/display/search?article_id=8032&unlinked_only=true" for 40.77.167.9 at 2022-11-08 13:13:27 +0000
I, [2022-11-08T13:13:27.342792 #28010] INFO -- : Parameters: {"article_id"=>"8032", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T13:43:47.909533 #32577] INFO -- : Started GET "/display/search?article_id=9401&unlinked_only=true" for 44.228.64.221 at 2022-11-08 13:43:47 +0000
I, [2022-11-08T13:43:47.912361 #32577] INFO -- : Parameters: {"article_id"=>"9401", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T13:46:34.721436 #32557] INFO -- : Started HEAD "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 217.182.175.162 at 2022-11-08 13:46:34 +0000
I, [2022-11-08T13:46:34.724228 #32557] INFO -- : Parameters: {"article_id"=>"71951", "authenticity_token"=>"s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm/W5RHtWd1Rlvz/rB/287X2njAWd+MALcLnikl3Lzpxf6pZNbGWAQ==", "collection_id"=>"cwrgm-public", "unlinked_only"=>"true"}
I, [2022-11-08T13:46:34.728854 #32577] INFO -- : Started HEAD "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 217.182.175.162 at 2022-11-08 13:46:34 +0000
I, [2022-11-08T13:46:34.731138 #32577] INFO -- : Parameters: {"article_id"=>"71951", "authenticity_toke
We had a brief outage this morning. The culprits seem to be two long-running transactions occuring simultaneously, one of them a paged search and the other a faceted search:
Interleaved with that are some long-running calls to faceted browsing, but these complete more quickly when the search is finished.