DistributedProofreaders / dproofreaders

Distributed Proofreaders is a web application intended to ease the process of converting public domain books into e-texts.
https://www.pgdp.net
GNU General Public License v2.0
46 stars 28 forks source link

Stop logging WordCheck corrections #1242

Closed cpeel closed 1 day ago

cpeel commented 1 week ago

Every time someone submits a correction in WordCheck, in the record of the event itself we write the corrected text to the corrections column in the wordcheck_events table. They look like this:

[-Laman-] {+Lamán+}
[-Komarov-] {+Komaróv+}
[-pubHcist-] {+publicist+}
[-Vyazemsky-] {+Vyázemsky+}

But we never use it. Ever. Nor have we ever used it. At some point (circa 2007) we thought that data might come in handy for some analysis later but here we are ~17 years later and nada.

This PR:

  1. stops collecting the data
  2. includes an upgrade script to drop the column

Before we drop the column from PROD I intend to do write a dump of this SQL query to a file in some structured format:

select projectid, image, round_id, corrections from wordcheck_events where corrections <> "";

That data might be useful to some researcher somewhere who is interested in common scannos -- the projectid could give them information on the language and if they really cared they could reference it back to the scan and page text. I'll do the same for the wordcheck_events in the archive table too (dump the data and then drop the column).

Sandbox to confirm I didn't break WordCheck itself from the corrections excising: https://www.pgdp.org/~cpeel/c.branch/stop-wc-correction-logging/

cpeel commented 1 week ago

@lhamilton1 - please confirm this approach sounds good to you.

lhamilton1 commented 1 week ago

I approve