TheCanadianConfederationDebates / TCCD

Repository for the data and codebase for The Canadian Confederation Debates project.
1 stars 6 forks source link

H of C column numbers #107

Closed DanielHeidt closed 6 years ago

DanielHeidt commented 6 years ago

On pages like http://hcmc.uvic.ca/confederation/en/lgHC_AB_SK_1905-05-01.html, the column numbers (which act as page numbers in House of Commons Records) are being encoded as pairs, rather than with their respective columns. Is it possible to position these column numbers at the top of each column in the XML so that the better approximate the original records? Could this be done with script that looks for two numbers within an element and, when true, add the second page number immediately after the <hr>?

lyallg commented 6 years ago

I will have a look on Tuesday.

lyallg commented 6 years ago

@DanielHeidt I recall an earlier conversation about the display of <fw>, do we want these numbers visible at all? As they interrupt the text maybe there is a better way of handling this information?

martindholmes commented 6 years ago

I have a vague recollection of talk of putting them in the margin.

DanielHeidt commented 6 years ago

The red in-line page numbers are placed correctly as per my discussions with @FrankFlitton . The problem is that they aren't encoded in the right place in the XML. We deliberately placed the page numbers in-line, so that readers will know exactly where in the text the page changes occurred.

This XML markup problem only occurs in some H of C records. Take, for example 5142 from the linked example above. The element should instead be placed right after the <cb/> (my apologies for the previous reference to <hr/> which we had used in the early transcriber workflow). This has been done correctly in other documents like http://hcmc.uvic.ca/confederation/en/lgPCLC_1865-02-09.html.

A possible script check to resolve this error could be something like:

  1. check for <cb/> and when found, check previous lines for if two <fw type>elements appear within a few lines of each other (I've also seen occasions in the code where the two <fw type>elements appear on the same line). If this test is true, move the second '' element down to immediately follow the <cb/>.

There might be other ways of doing this, but since there is a clear pattern, there is likely a way to script a fix without too much tinkering.

@lyallg please have the encoders begin to mark column numbers like page numbers in the texts from this point forward to facilitate better citations of this project's records.

martindholmes commented 6 years ago

XSLT to fix this, along with fixed versions of all these files, done in commit b85f8b60b. If all builds well and looks OK, I'll close this.

martindholmes commented 6 years ago

OK, working now. Future AB_SK docs will have to be run through the transformation in code/xslt/move_column_numbers.xsl but that's trivial. Final changes committed in #48953f1e5.