benwbrum / fromthepage

FromThePage is a wiki-like application for crowdsourcing transcription of handwritten documents.
http://fromthepage.com
GNU Affero General Public License v3.0
170 stars 51 forks source link

missing subject records #4269

Open saracarl opened 2 months ago

saracarl commented 2 months ago

CWRGM is one of our biggest users for subject linking. They recently reported:

I recently found an issue with our Secretaries tag at FTP, and I am puzzled by it, so I hoping one of y’all can help. We’ve been tagging Secretaries for a while and currently it is at the Omeka site under the Professional Labor category with an annotation and is tagged in 2,113 documents. However, I recently discovered in FTP that the Secretaries tag is in a new category (Occupations), is missing its annotation, and is only tagged to 60 documents. I cannot even find the original tag anymore in FTP. Do you have any sense of what could have happened to it? I would say that someone on the team accidentally deleted it, but I know you have to remove the links before you can delete a tag, and I highly doubt a team member removed over 2,000 instances of a subject tag.

If we need to just not tag secretaries because the data is lost, that’s fine, but I would like to know what happened if possible so this doesn’t happen again, especially on a tag where we cannot lose that data. Do any of you have any back channel ways of finding our missing tag? Or what happened it it?

and

I found another tag with this issue. The Mississippi-Executive Department at Omeka has over 2,000 tags and an annotation but only 76 docs and no annotation at FTP.

It is possible that they were combined together in FTP, but neither Camp nor I have combined them (maybe someone who should not have been, did?). I would also think, however, that if the tags were merged together they would have 2,173 docs tagged to Secretaries and over 2,000 tagged to the MS Executive Dept. tag.

saracarl commented 2 months ago

This seems like it was introduced recently, so I went looking for PRs that might account for it.
This one seemed most likely.

benwbrum commented 2 months ago

Looking at the Secretaries example, it looks like the Omeka site has a Secretaries article with S95466 in the URL, which should correspond to a subject/article ID in FromThePage.

In my old development copy of the db, the article exists:

2.7.3 :001 > Article.find 95466
Creating scope :target_article_links. Overwriting existing method Article.target_article_links.
Creating scope :page_article_links. Overwriting existing method Article.page_article_links.
  Article Load (0.4ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`id` = 95466 LIMIT 1
 => #<Article id: 95466, title: "Secretaries", source_text: "“Someone who works in an office, writing letters, ...", created_on: "2021-02-24 19:21:27.000000000 +0000", lock_version: 1, xml_text: "<?xml version='1.0' encoding='UTF-8'?>    \n      <...", graph_image: "/home/fromthepage/deployment/releases/202307261555...", collection_id: 981, latitude: nil, longitude: nil, uri: "https://dictionary.cambridge.org/us/dictionary/eng...", provenance: nil, created_by_id: 215593, pages_count: 1549> 

However in production, this record does not exist:

2.7.3 :001 > Article.find 95466
Traceback (most recent call last):
        1: from (irb):1
ActiveRecord::RecordNotFound (Couldn't find Article with 'id'=95466)

Note that my local database has had the orphan article clean-up script run on it.

benwbrum commented 2 months ago

The new Secretaries article was created on August 8, which gives us a terminus ante quem date for this problem. Looking for discontinuities in dates might give us more specific time-frames.

2.7.3 :002 > Article.find 32157202
 => #<Article id: 32157202, title: "Secretaries", source_text: nil, created_on: "2024-08-08 17:15:26.000000000 +0000", lock_version: 0, xml_text: nil, graph_image: "/home/fromthepage/deployment/releases/202408072043...", collection_id: 981, latitude: nil, longitude: nil, uri: nil, provenance: nil, created_by_id: 32018409, pages_count: 205> 
benwbrum commented 2 months ago

The page_article_link records seem to reflect the new subject -- there are no orphan records pointing the old subject id

2.7.3 :003 > PageArticleLink.where(article_id: 95466).count
 => 0 
2.7.3 :004 > PageArticleLink.where(article_id: 32157202).count
 => 205 

However, we should be able to look for the old ID in the XML text of some of these pages.

benwbrum commented 2 months ago

There are still pages in production whose xml_text refers to the old article id:

2.7.3 :011 > c.pages.where("xml_text like '%target_id=\\'95466\\'%'").count
 => 2359 
benwbrum commented 2 months ago

Example page pointing to the old Secretaries record: https://fromthepage.com/cwrgm/cwrgm-rev2/voucher-to-southwestern-telegraph-company-january-2-1864/display/32180310

(This has had its page_article_links cleaned out so the record does not exist in it.)

benwbrum commented 2 months ago

Looking at the WWP, we see this:

2.7.3 :001 > page = Page.find 33951039
 => #<Page id: 33951039, title: "page_0003", source_text: "I presume that an\r\nordinary bishop's recommend\r\nwi...... 
I presume that an
ordinary bishop's recommend
will fill the bill for cre-
dentials. Is this so; and is it
necessary for the recommend
to be endorsed by yourself, or
will it be sufficient for him
to present it at the [[Brigham Young Academy, Provo, Utah County, Utah Territory|academy]].

Hoping you can reply
soon

I remain
Your brother
In the Gospel

[[Levi Mathers Savage|L. M. Savage]]
Bishop. 

I think it will be all right for
him to come. Send his credentials
direct to the Rep. and see if
bro. [[William Charles Spence|Spence]] can do anything
for his <hi rend="underline">fare</hi>, and let them know.

[[Joseph Fielding Smith|J. F. S.]] => nil 
<?xml version='1.0' encoding='UTF-8'?>    
        <p>I presume that an<lb/>ordinary bishop's recommend<lb/>will fill the bill for cre<lb break='no'/>dentials. Is thto present it at the <link link_id='39303239' target_id='32011034' target_title='Brigham Young Academy, Provo, Utah County, Utah Territory'>academy</link>.</p><p>Hoping you can reply<lb/>soon</p><p>I remain<lb/>Your brother<lb/>In the Gospel</p><p><link link_id='39303240' target_id='32013315' target_title='Levi Mathers Savage'>L. M. Savage</link><lb/>Bishop.</p><pink_id='39303241' target_id='32030321' target_title='William Charles Spence'>Spence</link> can do anything<lb/>for his <hi rend='underline'>fare</hi>, and let them know.</p><p><link link_id='39303242' target_id='32157344' target_title='Joseph Fielding Smith'>J. F. S.</link></p>
      </page>
2.7.3 :004 > a = Article.find 32157344
Traceback (most recent call last):
        1: from (irb):4
ActiveRecord::RecordNotFound (Couldn't find Article with 'id'=32157344)
2.7.3 :005 > page.page_article_links.last
 => #<PageArticleLink id: 39303241, page_id: 33951039, article_id: 32030321, display_text: "Spence", created_on: "2024-08-21 15:04:51.000000000 +0000", text_type: "transcription"> 
2.7.3 :006 > page.page_article_links
, display_text: "academy", created_on: "2024-08-21 15:04:51.000000000 +0000", text_type: "transcription">, #<PageArticleLi00000000 +0000", text_type: "transcription">, #<PageArticleLink id: 39303241, page_id: 33951039, article_id: 32030321, display_text: "Spence", created_on: "2024-08-21 15:04:51.000000000 +0000", text_type: "transcription">]> 
2.7.3 :007 > pp page.page_article_links
[#<PageArticleLink:0x000063bf04edcb88
  id: 39303239,
  page_id: 33951039,
  article_id: 32011034,
  display_text: "academy",
  created_on: Wed, 21 Aug 2024 15:04:51.000000000 UTC +00:00,
  text_type: "transcription">,
 #<PageArticleLink:0x000063bf04edc8e0
  id: 39303240,
  page_id: 33951039,
  article_id: 32013315,
  display_text: "L. M. Savage",
  created_on: Wed, 21 Aug 2024 15:04:51.000000000 UTC +00:00,
  text_type: "transcription">,
 #<PageArticleLink:0x000063bf04edc570
  id: 39303241,
  page_id: 33951039,
  article_id: 32030321,
  display_text: "Spence",
  created_on: Wed, 21 Aug 2024 15:04:51.000000000 UTC +00:00,
  text_type: "transcription">]
 => #<ActiveRecord::Associations::CollectionProxy [#<PageArticleLink id: 39303239, page_id: 33951039, article_id: 32011034, display_text: "academy", created_on: "2024-08-21 15:04:51.000000000 +0000", text_type: "transcription">, #<PageArticleLink id: 39303240, page_id: 33951039, article_id: 32013315, display_text: "L. M. Savage", created_on: "2024-08-21 15:04:51.000000000 +0000", text_type: "transcription">, #<PageArticleLink id: 39303241, page_id: 33951039, article_id: 32030321, display_text: "Spence", created_on: "2024-08-21 15:04:51.000000000 +0000", text_type: "transcription">]> 
2.7.3 :008 > c = page.collection
 => #<Collection id: 970, title: "Wilford Woodruff Papers Project", owner_user_id: 221669, created_on: "2020-07-27 2... 
2.7.3 :009 > c.articles.where(title: 'Joseph Fielding Smith').count
 => 1 
2.7.3 :010 > c.articles.where(title: 'Joseph Fielding Smith').first
 => #<Article id: 32159119, title: "Joseph Fielding Smith", source_text: nil, created_on: "2024-08-21 15:10:52.000000000 +0000", lock_version: 0, xml_text: nil, graph_image: "/home/fromthepage/deployment/releases/202408072043...", collection_id: 970, latitude: nil, longitude: nil, uri: nil, provenance: nil, created_by_id: 32023688, pages_count: 21> 
2.7.3 :011 > c.articles.where(title: 'Joseph Fielding Smith').first.user
Traceback (most recent call last):
        1: from (irb):11
NoMethodError (undefined method `user' for #<Article:0x000063bf066c7888>)
Did you mean?  super
2.7.3 :012 > c.articles.where(title: 'Joseph Fielding Smith').first.created_by
Traceback (most recent call last):
        2: from (irb):11
        1: from (irb):12:in `rescue in irb_binding'
NoMethodError (undefined method `created_by' for #<Article:0x000063bf05ac9ce8>)
Did you mean?  created_by_id
               created_on
               created_on?
               created_on=
2.7.3 :013 > c.articles.where(title: 'Joseph Fielding Smith').first.created_by_id
 => 32023688 
2.7.3 :014 > User.find 32023688
 => #<User id: 32023688, login: "andecarson", display_name: "andecarson", real_name: "Carson Andersen", email: "carson.andersen@wilfordwoodruffpapers.org", owner: false, admin: false, created_at: "2024-04-26 15:06:42.000000000 +0000", updated_at: "2024-08-22 14:08:24.000000000 +0000", remember_token_expires_at: nil, location: nil, website: nil, about: nil, account_type: nil, paid_date: nil, guest: nil, slug: "andecarson", deleted: false, provider: nil, uid: nil, start_date: nil, orcid: nil, dictation_language: "en-US", activity_email: true, external_id: nil, sso_issuer: nil, preferred_locale: nil, api_key: nil, picture: nil, help: nil, footer_block: "For questions about this project, contact at."> 
saracarl commented 2 months ago

We can see when a replacement subject was created, and by whom. All three of CWRGM's were created by a page edit by Alessandra Diaz on a page save at 11:15 central time on August 8th of the following page:
https://fromthepage.com/cwrgm/cwrgm-rev2/letter-from-j-w-piles-to-the-mississippi-state-board-of-registration-august-28-1876/transcribe/34048658

image

and here's the back end -- the show/display of that page:

image

What we see when we look at the versions is a save of the transcription, a save with subjects linking to the "old" instance of the subject, then a save with the subjects linking to the "new" instances of the subject, all within an 11 minute time frame, which is all very weird.

Here's the versions tab: https://fromthepage.com/cwrgm/cwrgm-rev2/letter-from-j-w-piles-to-the-mississippi-state-board-of-registration-august-28-1876/versions/34048658 The one we're most interested in is the save where she changes the State of Mississippi link from "Mississippi--Executive Office" to "Mississippi--Executive Department".

Based on this, what we think happened is that she linked "Mississippi--Executive Office", was asked to categorize it, realized she got it wrong, hit "cancel" on the subject categorization page, which kicked us into this new code. That cancellation is supposed to delete the new/abandoned subject, but what we think is happening is that all three of the subjects on the page are being deleted. When she re-saves the page after correcting the mis-link, it recreates all three of the subjects. This matches what we are seeing in their list of orphaned subjects.

saracarl commented 2 months ago

Based on the creation dates on subjects in this spreadsheet (i.e. Fruita), the problem is introduced -- or subjects are recreated -- on an edit where a link is removed. Here, it's Latter Day Saints

https://gist.github.com/benwbrum/be149bbc53031fa55745c706d5d13961#file-versions_history_33947106-diff-L119

saracarl commented 1 week ago

Log messages from investigating with tripwire code:

I, [2024-10-22T23:18:46.524817 #2021610]  INFO -- : Started GET "/woodruff/wilford-woodruff-papers-project/letter-from-thomas-edwin-ricks-james-henry-har
t-and-joseph-coulson-rich-2-june-1890-le-34727/transcribe/34242591" for 63.225.197.57 at 2024-10-22 23:18:46 +0000

I, [2024-10-22T23:18:46.527225 #2021610]  INFO -- :   Parameters: {"user_slug"=>"woodruff", "collection_id"=>"wilford-woodruff-papers-project", "work_id"
=>"letter-from-thomas-edwin-ricks-james-henry-hart-and-joseph-coulson-rich-2-june-1890-le-34727", "page_id"=>"34242591"}

I, [2024-10-22T23:18:47.322653 #2021561]  INFO -- : Started GET "/marindasmith/970/32133297/still_editing/34242591" for 63.225.197.57 at 2024-10-22 23:18
:47 +0000
I, [2024-10-22T23:18:47.325288 #2021561]  INFO -- : Processing by TranscribeController#still_editing as */*
I, [2024-10-22T23:18:47.325514 #2021561]  INFO -- :   Parameters: {"user_slug"=>"marindasmith", "collection_id"=>"970", "work_id"=>"32133297", "page_id"=
>"34242591"}

I, [2024-10-22T23:19:03.019410 #2021537]  INFO -- : Started GET "/woodruff/970/32133297/34242591/active_editing" for 63.225.197.57 at 2024-10-22 23:19:03
 +0000
I, [2024-10-22T23:19:03.019381 #2021660]  INFO -- : Started GET "/page_version/show?page_version_id=34957454" for 44.214.187.82 at 2024-10-22 23:19:03 +0
000
I, [2024-10-22T23:19:03.021540 #2021537]  INFO -- : Processing by TranscribeController#active_editing as */*
I, [2024-10-22T23:19:03.021674 #2021635]  INFO -- :   Rendered collection/show.html.slim within layouts/application (Duration: 153.0ms | Allocations: 626
67)
I, [2024-10-22T23:19:03.021733 #2021537]  INFO -- :   Parameters: {"user_slug"=>"woodruff", "collection_id"=>"970", "work_id"=>"32133297", "page_id"=>"34
242591"}

I, [2024-10-22T23:19:47.457047 #2021585]  INFO -- : Started GET "/marindasmith/970/32133297/still_editing/34242591" for 63.225.197.57 at 2024-10-22 23:19
:47 +0000
I, [2024-10-22T23:19:47.459027 #2021585]  INFO -- : Processing by TranscribeController#still_editing as */*
I, [2024-10-22T23:19:47.459216 #2021585]  INFO -- :   Parameters: {"user_slug"=>"marindasmith", "collection_id"=>"970", "work_id"=>"32133297", "page_id"=
>"34242591"}

I, [2024-10-22T23:20:41.624364 #2021685]  INFO -- : Started PATCH "/woodruff/wilford-woodruff-papers-project/review/one_off/34242591" for 63.225.197.57 a
t 2024-10-22 23:20:41 +0000
I, [2024-10-22T23:20:41.626665 #2021685]  INFO -- : Processing by TranscribeController#save_transcription as HTML
I, [2024-10-22T23:20:41.626898 #2021685]  INFO -- :   Parameters: {"authenticity_token"=>"rzQtCLBbPYpzVF4JRbvdPlsSeeEip40+APdYAzF+F6uJsU/X0j2gB4cDB36QpvG
sPGYuOKgyyNSWEC/Aw2NJwA==", "page_id"=>"34242591", "flow"=>"", "quality_sampling_id"=>"", "page"=>{"mark_blank"=>"0", "source_text"=>"Charles H. Hart.\r\
n\r\nLAND BUSINESS.\r\nREAL ESTATE.\r\nCOLLECTIONS.\r\n\r\nHart & Son,\r\nATTORNEYS AT LAW.\r\nOffices in Court House and on Main Street.\r\n\r\nParis, B
ear Lake Co., Idaho, 1890.\r\n\r\nuntil the Briggs case had been heard and disposed \r\nof. The matter will not be presented until the signs \r\nare more
 favorable.\r\n\r\nNothing worthy of special notice more that \r\nthat already mentioned has transpired since we wrote \r\nto you. The Grand Jury is stil
l in Session—Col.\r\nJones of the [[Blackfoot, Bingham County, Idaho Territory|Blackfoot]] News told us sub rosa that \r\n56 indictments had been matured
 to-day, all of \r\nthem growing out of the election business—he thought \r\nthe object was political capital manufactured to keep \r\nthe old Anti-Mormo
n party alive a little longer— \r\nbut he thought the issue in southern [[Idah Territory|Idaho]] was \r\ndead and could not be made to serve another camp
aign.\r\n\r\nPraying the Lord to continue to bless and strengthen  \r\nyour in your arduous labors.\r\n\r\nWe remain as ever \r\nYour brethren in the Gos
pel\r\n\r\n[[Thomas Edwin Ricks|T. E. Ricks]]\r\n[[James Henry Hart|James H. Hart]].\r\n[[Joseph Coulson Rich|J. C. Rich]]."}, "save_to_needs_review"=>""
, "filter-brightness"=>"0", "filter-contrast"=>"0", "filter-threshold"=>"0", "user_slug"=>"woodruff", "collection_id"=>"wilford-woodruff-papers-project"}

I, [2024-10-22T23:20:41.648529 #2021685]  INFO -- : TRANSCRIPTION       2024-10-22 23:20:41 +0000
TRANSCRIPTION   User    ID: 25091055    Email: marinda@amqc.net Display Name: marindasmith
TRANSCRIPTION   Collection      ID: 970 Title:Wilford Woodruff Papers Project   Owner Email: contact@wilfordwoodruffpapers.org
TRANSCRIPTION   Work    ID: 32133297    Title: Letter from Thomas Edwin Ricks, James Henry Hart, and Joseph Coulson Rich, 2 June 1890 [LE-34727]
TRANSCRIPTION   Page    ID: 34242591    Position: 3     Title:page_0003
TRANSCRIPTION   Source Text:
BEGIN_SOURCE_TEXT
Charles H. Hart.

LAND BUSINESS.
REAL ESTATE.
COLLECTIONS.

Hart & Son,
ATTORNEYS AT LAW.
Offices in Court House and on Main Street.

Paris, Bear Lake Co., Idaho, 1890.

until the Briggs case had been heard and disposed 
of. The matter will not be presented until the signs 
are more favorable.

Nothing worthy of special notice more that 
that already mentioned has transpired since we wrote 
to you. The Grand Jury is still in Session—Col.
Jones of the [[Blackfoot, Bingham County, Idaho Territory|Blackfoot]] News told us sub rosa that 
56 indictments had been matured to-day, all of 
them growing out of the election business—he thought 
the object was political capital manufactured to keep 
the old Anti-Mormon party alive a little longer— 
but he thought the issue in southern [[Idah Territory|Idaho]] was 
dead and could not be made to serve another campaign.

Praying the Lord to continue to bless and strengthen  
your in your arduous labors.

We remain as ever 
Your brethren in the Gospel

[[Thomas Edwin Ricks|T. E. Ricks]]
[[James Henry Hart|James H. Hart]].
[[Joseph Coulson Rich|J. C. Rich]].
END_SOURCE_TEXT

I, [2024-10-22T23:20:41.655650 #2021685]  INFO -- : ISSUE4269 old_article_count = 27703

I, [2024-10-22T23:20:46.184064 #2021685]  INFO -- : Redirected to https://fromthepage.com/transcribe/assign_categories?collection_id=wilford-woodruff-pap
ers-project&next_page_id=34242591&page_id=34242591

I, [2024-10-22T23:20:46.324265 #2021635]  INFO -- : Started GET "/transcribe/assign_categories?collection_id=wilford-woodruff-papers-project&next_page_id
=34242591&page_id=34242591" for 63.225.197.57 at 2024-10-22 23:20:46 +0000
I, [2024-10-22T23:20:46.327060 #2021635]  INFO -- : Processing by TranscribeController#assign_categories as HTML
I, [2024-10-22T23:20:46.327270 #2021635]  INFO -- :   Parameters: {"collection_id"=>"wilford-woodruff-papers-project", "next_page_id"=>"34242591", "page_
id"=>"34242591"}

I, [2024-10-22T23:20:52.859002 #2021561]  INFO -- : Started GET "/woodruff/wilford-woodruff-papers-project/letter-from-thomas-edwin-ricks-james-henry-har
t-and-joseph-coulson-rich-2-june-1890-le-34727/transcribe/34242591?rollback_delete_ids%5B%5D=32028809&rollback_delete_ids%5B%5D=32167091&rollback_delete_
ids%5B%5D=32001535&rollback_delete_ids%5B%5D=32139631&rollback_delete_ids%5B%5D=32049552&rollback_unset_ids%5B%5D=32167091" for 63.225.197.57 at 2024-10-
22 23:20:52 +0000

I, [2024-10-22T23:20:52.861409 #2021561]  INFO -- : Processing by TranscribeController#display_page as HTML
I, [2024-10-22T23:20:52.861644 #2021561]  INFO -- :   Parameters: {"rollback_delete_ids"=>["32028809", "32167091", "32001535", "32139631", "32049552"], "
rollback_unset_ids"=>["32167091"], "user_slug"=>"woodruff", "collection_id"=>"wilford-woodruff-papers-project", "work_id"=>"letter-from-thomas-edwin-rick
s-james-henry-hart-and-joseph-coulson-rich-2-june-1890-le-34727", "page_id"=>"34242591"}
I, [2024-10-22T23:20:52.873796 #2021561]  INFO -- :   Rendered inline template (Duration: 0.4ms | Allocations: 76)

W, [2024-10-22T23:20:53.211549 #2021561]  WARN -- : ISSUE4269 Warning: Article 32001535 Thomas Edwin Ricks in collection Wilford Woodruff Papers Project 
is being destroyed.
W, [2024-10-22T23:20:53.245121 #2021561]  WARN -- : ISSUE4269 Warning: Article 32028809 Blackfoot, Bingham County, Idaho Territory in collection Wilford 
Woodruff Papers Project is being destroyed.
W, [2024-10-22T23:20:53.265932 #2021561]  WARN -- : ISSUE4269 Warning: Article 32049552 Joseph Coulson Rich in collection Wilford Woodruff Papers Project
 is being destroyed.

W, [2024-10-22T23:20:53.316457 #2021561]  WARN -- : ISSUE4269 Warning: Article 32139631 James Henry Hart in collection Wilford Woodruff Papers Project is
 being destroyed.

W, [2024-10-22T23:20:53.325134 #2021561]  WARN -- : ISSUE4269 Warning: Article 32167091 Idah Territory in collection Wilford Woodruff Papers Project is b
eing destroyed.

I, [2024-10-22T23:20:53.563111 #2021561]  INFO -- :   Rendered transcribe/display_page.html.slim within layouts/application (Duration: 233.4ms | Allocati
ons: 323903)
I, [2024-10-22T23:20:53.573706 #2021561]  INFO -- :   Rendered layout layouts/application.html.slim (Duration: 244.4ms | Allocations: 333234)
I, [2024-10-22T23:20:53.580009 #2021561]  INFO -- : ISSUE4269 WARNING 27704 > 27699 at transcribe#display_page
I, [2024-10-22T23:20:53.585531 #2021561]  INFO -- : Completed 200 OK in 724ms (Views: 180.8ms | ActiveRecord: 199.8ms | Allocations: 632011)
I, [2024-10-22T23:20:53.585786 #2021561]  INFO -- : Oink Action: transcribe#display_page
I, [2024-10-22T23:20:53.585967 #2021561]  INFO -- : Memory usage: 2468880 | PID: 2021561
I, [2024-10-22T23:20:53.586179 #2021561]  INFO -- : Instantiation Breakdown: Total: 583 | PageArticleLink: 543 | Collection: 8 | PageBlock: 6 | ArticleVersion: 6 | Article: 5 | EditorButton: 5 | User: 3 | Page: 3 | Work: 2 | Visit: 1 | Ahoy::Event: 1
I, [2024-10-22T23:20:53.586310 #2021561]  INFO -- : Oink Log Entry Complete
saracarl commented 1 week ago

From this, and looking at the page versions, we determined that the following:

The solution is to handle uncategorized links differently (don't delete!!) on the categorization cancellation.

saracarl commented 1 week ago

According to CWRGM and WWP, we're deleting more than just uncategorized subjects. We're going to roll this out anyway, but there may continue to be problems we need to investigate.