benwbrum / fromthepage

FromThePage is a wiki-like application for crowdsourcing transcription of handwritten documents.
http://fromthepage.com
GNU Affero General Public License v3.0
171 stars 51 forks source link

Outage during paged search and facet display #3364

Closed benwbrum closed 1 year ago

benwbrum commented 2 years ago

We had a brief outage this morning. The culprits seem to be two long-running transactions occuring simultaneously, one of them a paged search and the other a faceted search:

I, [2022-10-21T12:34:13.126470 #17630]  INFO -- : Started POST "/display_search?article_id=71951&collection_id=cwrgm-public&unlinked_only=true" for 73.101.108.151 at 2022-10-21 12:34:13 +0000
I, [2022-10-21T12:34:13.128755 #17630]  INFO -- : Processing by DisplayController#search as HTML
I, [2022-10-21T12:34:13.129104 #17630]  INFO -- :   Parameters: {"authenticity_token"=>"s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm/W5RHtWd1Rlvz/rB/287X2njAWd+MALcLnikl3Lzpxf6pZNbGWAQ==", "article_id"=>"71951", "collection_id"=>"cwrgm-public", "unlinked_only"=>"true"}
I, [2022-10-21T12:34:13.150048 #17630]  INFO -- : Redirected to https://fromthepage.com/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true
I, [2022-10-21T12:34:13.160690 #17630]  INFO -- : Completed 302 Found in 31ms (ActiveRecord: 8.1ms | Allocations: 12181)
I, [2022-10-21T12:34:13.161144 #17630]  INFO -- : Oink Action: display#search
I, [2022-10-21T12:34:13.161418 #17630]  INFO -- : Memory usage: 1276692 | PID: 17630
I, [2022-10-21T12:34:13.161665 #17630]  INFO -- : Instantiation Breakdown: Total: 10 | PageBlock: 3 | DocumentSet: 2 | User: 1 | Article: 1 | Collection: 1 | Visit: 1 | Ahoy::Event: 1
I, [2022-10-21T12:34:13.161898 #17630]  INFO -- : Oink Log Entry Complete
I, [2022-10-21T12:34:13.258446 #17630]  INFO -- : Started GET "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 73.101.108.151 at 2022-10-21 12:34:13 +0000
I, [2022-10-21T12:34:13.260696 #17630]  INFO -- : Processing by DisplayController#paged_search as HTML
I, [2022-10-21T12:34:13.260962 #17630]  INFO -- :   Parameters: {"article_id"=>"71951", "authenticity_token"=>"s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm/W5RHtWd1Rlvz/rB/287X2njAWd+MALcLnikl3Lzpxf6pZNbGWAQ==", "collection_id"=>"cwrgm-public", "unlinked_only"=>"true"}
I, [2022-10-21T12:34:13.285801 #17630]  INFO -- :   Rendering display/paged_search.html.slim within layouts/application
I, [2022-10-21T12:34:13.371067 #17630]  INFO -- :   Rendered shared/_collection_tabs.html.slim (Duration: 80.5ms | Allocations: 6453)
I, [2022-10-21T12:36:27.232411 #17630]  INFO -- :   Rendered display/_pages_view.html.slim (Duration: 133860.2ms | Allocations: 761102)
I, [2022-10-21T12:37:33.085291 #17630]  INFO -- :   Rendered shared/_pagination.html.slim (Duration: 65851.8ms | Allocations: 7896)
I, [2022-10-21T12:37:33.799330 #17630]  INFO -- :   Rendered article/_article_links.html.slim (Duration: 705.8ms | Allocations: 6509)
I, [2022-10-21T12:37:33.831985 #17630]  INFO -- :   Rendered display/_multi_page.html.slim (Duration: 200459.9ms | Allocations: 794642)
I, [2022-10-21T12:37:33.833244 #17630]  INFO -- :   Rendered shared/_collection_footer.html.slim (Duration: 0.2ms | Allocations: 53)
I, [2022-10-21T12:37:33.834055 #17630]  INFO -- :   Rendered display/paged_search.html.slim within layouts/application (Duration: 200547.4ms | Allocations: 804096)
I, [2022-10-21T12:37:33.836307 #17630]  INFO -- :   Rendered layouts/_mixpanel.erb (Duration: 0.2ms | Allocations: 50)
I, [2022-10-21T12:37:33.837356 #17630]  INFO -- :   Rendered layouts/_fb_pixel.erb (Duration: 0.1ms | Allocations: 50)
I, [2022-10-21T12:37:33.843340 #17630]  INFO -- :   Rendered shared/_breadcrumbs.html.slim (Duration: 0.6ms | Allocations: 99)
I, [2022-10-21T12:37:33.847759 #17630]  INFO -- :   Rendered layouts/_ga.erb (Duration: 0.3ms | Allocations: 50)
I, [2022-10-21T12:37:33.863525 #17630]  INFO -- : Completed 200 OK in 200602ms (Views: 865.1ms | ActiveRecord: 199706.2ms | Allocations: 823777)
I, [2022-10-21T12:37:33.864116 #17630]  INFO -- : Oink Action: display#paged_search
I, [2022-10-21T12:37:33.864577 #17630]  INFO -- : Memory usage: 1276692 | PID: 17630
I, [2022-10-21T12:37:33.864971 #17630]  INFO -- : Instantiation Breakdown: Total: 266 | PageVersion: 84 | PageArticleLink: 70 | User: 26 | Page: 25 | Work: 24 | ScCanvas: 23 | Category: 8 | Collection: 2 | Article: 1 | DocumentSet: 1 | Visit: 1 | Ahoy::Event: 1
I, [2022-10-21T12:37:33.865297 #17630]  INFO -- : Oink Log Entry Complete

Interleaved with that are some long-running calls to faceted browsing, but these complete more quickly when the search is finished.

benwbrum commented 2 years ago

We have not had a single complaint about the lack of subject-seeded text search.

benwbrum commented 2 years ago

This just happened again, causing another brief outage.

Log entry shows paged search on subjects:

I, [2022-11-08T13:46:34.728854 #32577]  INFO -- : Started HEAD "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 217.182.175.162 at 2022-11-08 13:46:34 +0000
I, [2022-11-08T13:46:34.730909 #32577]  INFO -- : Processing by DisplayController#paged_search as HTML
I, [2022-11-08T13:46:34.731138 #32577]  INFO -- :   Parameters: {"article_id"=>"71951", "authenticity_token"=>"s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm/W5RHtWd1Rlvz/rB/287X2njAWd+MALcLnikl3Lzpxf6pZNbGWAQ==", "collection_id"=>"cwrgm-public", "unlinked_only"=>"true"}
I, [2022-11-08T13:46:34.778929 #32577]  INFO -- :   Rendering display/paged_search.html.slim within layouts/application
I, [2022-11-08T13:46:34.787304 #32577]  INFO -- :   Rendered shared/_collection_tabs.html.slim (Duration: 5.1ms | Allocations: 2336)
I, [2022-11-08T13:49:00.084928 #32577]  INFO -- :   Rendered display/_pages_view.html.slim (Duration: 145296.6ms | Allocations: 1505750)
I, [2022-11-08T13:51:05.692721 #32577]  INFO -- :   Rendered shared/_pagination.html.slim (Duration: 125606.4ms | Allocations: 14793)
I, [2022-11-08T13:51:05.724237 #32577]  INFO -- :   Rendered article/_article_links.html.slim (Duration: 23.8ms | Allocations: 2580)
I, [2022-11-08T13:51:05.758603 #32577]  INFO -- :   Rendered display/_multi_page.html.slim (Duration: 270970.4ms | Allocations: 1542256)
I, [2022-11-08T13:51:05.760127 #32577]  INFO -- :   Rendered shared/_collection_footer.html.slim (Duration: 0.3ms | Allocations: 52)
I, [2022-11-08T13:51:05.760991 #32577]  INFO -- :   Rendered display/paged_search.html.slim within layouts/application (Duration: 270981.4ms | Allocations: 1547592)
I, [2022-11-08T13:51:05.763162 #32577]  INFO -- :   Rendered layouts/_mixpanel.erb (Duration: 0.2ms | Allocations: 50)
I, [2022-11-08T13:51:05.764140 #32577]  INFO -- :   Rendered layouts/_fb_pixel.erb (Duration: 0.1ms | Allocations: 50)
I, [2022-11-08T13:51:05.770331 #32577]  INFO -- :   Rendered shared/_breadcrumbs.html.slim (Duration: 0.5ms | Allocations: 99)
I, [2022-11-08T13:51:05.773808 #32577]  INFO -- :   Rendered layouts/_ga.erb (Duration: 0.2ms | Allocations: 51)
I, [2022-11-08T13:51:05.785532 #32577]  INFO -- : Completed 200 OK in 271054ms (Views: 1188.2ms | ActiveRecord: 269834.1ms | Allocations: 1567187)
I, [2022-11-08T13:51:05.786229 #32577]  INFO -- : Oink Action: display#paged_search
I, [2022-11-08T13:51:05.786554 #32577]  INFO -- : Memory usage: 1348524 | PID: 32577
I, [2022-11-08T13:51:05.786881 #32577]  INFO -- : Instantiation Breakdown: Total: 276 | PageArticleLink: 70 | PageVersion: 70 | User: 31 | Page: 31 | Work: 30 | ScCanvas: 30 | Category: 8 | Collection: 2 | Visit: 1 | Article: 1 | DocumentSet: 1 | Ahoy::Event: 1
I, [2022-11-08T13:51:05.787209 #32577]  INFO -- : Oink Log Entry Complete

How did a user get to this functionality since we disabled it in the view?

benwbrum commented 2 years ago

The landing page of the visit appears to be a search on CWRGM, which doesn't make any sense for a human:

{"id"=>22392200,
 "visit_token"=>"7f976686-f676-424c-bbf3-3c6fa43e8536",
 "visitor_token"=>"e73afdbd-59d1-4b4a-8b79-27ce315ba934",
 "ip"=>"217.182.175.162",
 "user_agent"=>
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36",
 "referrer"=>nil,
 "landing_page"=>
  "https://fromthepage.com/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true",
 "user_id"=>nil,
 "referring_domain"=>nil,
 "search_keyword"=>nil,
 "browser"=>"Chrome",
 "os"=>"Mac OS X",
 "device_type"=>"Desktop",
 "screen_height"=>nil,
 "screen_width"=>nil,
 "country"=>nil,
 "region"=>nil,
 "city"=>nil,
 "postal_code"=>nil,
 "latitude"=>nil,
 "longitude"=>nil,
 "utm_source"=>nil,
 "utm_medium"=>nil,
 "utm_term"=>nil,
 "utm_content"=>nil,
 "utm_campaign"=>nil,
 "started_at"=>Tue, 08 Nov 2022 13:46:34 UTC +00:00}

The visit only has a single action, and no user_id associated with it, so this definitely seems like a crawler

#<Ahoy::Event:0x00000000082d11f0
 id: 57781829,
 visit_id: 22392200,
 user_id: nil,
 name: "display#paged_search",
 properties:
  {"document_set_id"=>872,
   "document_set_title"=>
    "Civil War & Reconstruction Governors of Mississippi Project",
   "collection_id"=>981,
   "collection_title"=>
    "Civil War & Reconstruction Governors of Mississippi, Main Collection",
   "article_id"=>71951,
   "article_title"=>"Chalmers, James R. (James Ronald), 1831-1898"},
 time: Tue, 08 Nov 2022 13:51:05 UTC +00:00>

Searching for other instances of this, we see 34 hits to the disabled functionality in the last 8 hours. Astonishingly, the problem hits aren't even GET requests, they're HEAD:

$ grep unlinked_only production.log
I, [2022-11-08T06:55:24.234000 #6640]  INFO -- : Started GET "/display/search?article_id=76&unlinked_only=true" for 44.228.64.221 at 2022-11-08 06:55:24 +0000
I, [2022-11-08T06:55:24.238109 #6640]  INFO -- :   Parameters: {"article_id"=>"76", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T06:57:32.041636 #6640]  INFO -- : Started GET "/display/search?article_id=32&page=9&unlinked_only=true" for 44.228.64.221 at 2022-11-08 06:57:32 +0000
I, [2022-11-08T06:57:32.044774 #6640]  INFO -- :   Parameters: {"article_id"=>"32", "page"=>"9", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T07:33:53.176967 #11853]  INFO -- : Started GET "/display/search?article_id=13837&unlinked_only=true" for 44.228.64.221 at 2022-11-08 07:33:53 +0000
I, [2022-11-08T07:33:53.179763 #11853]  INFO -- :   Parameters: {"article_id"=>"13837", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T08:24:56.919039 #18708]  INFO -- : Started GET "/display/search?article_id=13258&page=7&unlinked_only=true" for 66.249.66.86 at 2022-11-08 08:24:56 +0000
I, [2022-11-08T08:24:56.921929 #18708]  INFO -- :   Parameters: {"article_id"=>"13258", "page"=>"7", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
E, [2022-11-08T08:24:56.926083 #18708] ERROR -- : Bad record ID exception for params=<ActionController::Parameters {"article_id"=>"13258", "page"=>"7", "unlinked_only"=>"true", "controller"=>"collection", "action"=>"show", "user_slug"=>"display", "id"=>"search"} permitted: false>
I, [2022-11-08T09:37:53.688331 #28406]  INFO -- : Started GET "/display/search?article_id=9&unlinked_only=true" for 44.228.64.221 at 2022-11-08 09:37:53 +0000
I, [2022-11-08T09:37:53.690781 #28406]  INFO -- :   Parameters: {"article_id"=>"9", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T09:40:12.097723 #29841]  INFO -- : Started GET "/display/search?article_id=18577&unlinked_only=true" for 207.46.13.129 at 2022-11-08 09:40:12 +0000
I, [2022-11-08T09:40:12.100363 #29841]  INFO -- :   Parameters: {"article_id"=>"18577", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
E, [2022-11-08T09:40:12.104124 #29841] ERROR -- : Bad record ID exception for params=<ActionController::Parameters {"article_id"=>"18577", "unlinked_only"=>"true", "controller"=>"collection", "action"=>"show", "user_slug"=>"display", "id"=>"search"} permitted: false>
I, [2022-11-08T10:09:16.981326 #32612]  INFO -- : Started GET "/display/search?article_id=11791&unlinked_only=true" for 44.228.64.221 at 2022-11-08 10:09:16 +0000
I, [2022-11-08T10:09:16.984608 #32612]  INFO -- :   Parameters: {"article_id"=>"11791", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T10:09:59.909035 #32652]  INFO -- : Started GET "/display/search?article_id=14396&unlinked_only=true" for 44.228.64.221 at 2022-11-08 10:09:59 +0000
I, [2022-11-08T10:09:59.911727 #32652]  INFO -- :   Parameters: {"article_id"=>"14396", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T11:11:15.916505 #10011]  INFO -- : Started GET "/display/search?article_id=7563&unlinked_only=true" for 44.228.64.221 at 2022-11-08 11:11:15 +0000
I, [2022-11-08T11:11:15.919066 #10011]  INFO -- :   Parameters: {"article_id"=>"7563", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T11:27:00.697764 #11586]  INFO -- : Started GET "/display/search?article_id=7242&unlinked_only=true" for 44.228.64.221 at 2022-11-08 11:27:00 +0000
I, [2022-11-08T11:27:00.700540 #11586]  INFO -- :   Parameters: {"article_id"=>"7242", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T11:27:46.766577 #11608]  INFO -- : Started GET "/display/search?article_id=12980&unlinked_only=true" for 44.228.64.221 at 2022-11-08 11:27:46 +0000
I, [2022-11-08T11:27:46.769565 #11608]  INFO -- :   Parameters: {"article_id"=>"12980", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T12:55:20.518001 #24703]  INFO -- : Started GET "/display/search?article_id=9379&unlinked_only=true" for 44.228.64.221 at 2022-11-08 12:55:20 +0000
I, [2022-11-08T12:55:20.520503 #24703]  INFO -- :   Parameters: {"article_id"=>"9379", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T13:13:27.340040 #28010]  INFO -- : Started GET "/display/search?article_id=8032&unlinked_only=true" for 40.77.167.9 at 2022-11-08 13:13:27 +0000
I, [2022-11-08T13:13:27.342792 #28010]  INFO -- :   Parameters: {"article_id"=>"8032", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T13:43:47.909533 #32577]  INFO -- : Started GET "/display/search?article_id=9401&unlinked_only=true" for 44.228.64.221 at 2022-11-08 13:43:47 +0000
I, [2022-11-08T13:43:47.912361 #32577]  INFO -- :   Parameters: {"article_id"=>"9401", "unlinked_only"=>"true", "user_slug"=>"display", "id"=>"search"}
I, [2022-11-08T13:46:34.721436 #32557]  INFO -- : Started HEAD "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 217.182.175.162 at 2022-11-08 13:46:34 +0000
I, [2022-11-08T13:46:34.724228 #32557]  INFO -- :   Parameters: {"article_id"=>"71951", "authenticity_token"=>"s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm/W5RHtWd1Rlvz/rB/287X2njAWd+MALcLnikl3Lzpxf6pZNbGWAQ==", "collection_id"=>"cwrgm-public", "unlinked_only"=>"true"}
I, [2022-11-08T13:46:34.728854 #32577]  INFO -- : Started HEAD "/paged_search?action=search&article_id=71951&authenticity_token=s5LlKZ6mGc4ripbmCo02A9avE8iyauXAAm%2FW5RHtWd1Rlvz%2FrB%2F287X2njAWd%2BMALcLnikl3Lzpxf6pZNbGWAQ%3D%3D&collection_id=cwrgm-public&controller=display&unlinked_only=true" for 217.182.175.162 at 2022-11-08 13:46:34 +0000
I, [2022-11-08T13:46:34.731138 #32577]  INFO -- :   Parameters: {"article_id"=>"71951", "authenticity_toke