Closed Jo-CCS closed 8 years ago
Reviewing the original change request filed by the IMPACT project, it seems as two changes are requested:
Example:
<processingStep ID="ID005">
<processingDateTime>2010-12-15T15:02:48</processingDateTime>
<processingAgency>ACME Agency</processingAgency>
<processingStepDescription>manual correction</processingStepDescription>
<processingStepSettings>misc. settings</processingStepSettings>
<processingSoftware>
<softwareCreator>USAL</softwareCreator>
<softwareName>Aletheia</softwareName>
<softwareVersion>1.2.3</softwareVersion>
</processingSoftware>
</processingSteps>
<TextLine ID="ID069" STYLEREFS="ID007" BASELINE="1261" CORRECTEDBY="ID005" VPOS="1230" HPOS="260" HEIGHT="40" WIDTH="902">
Justification:
"A lot of software tools and also human interactions are involved in different steps of the digitisation process. Each of them may affect an ALTO file by doing some refinements or corrections. From our point of view it would be desirable to keep track of the changes and verification done by the different agents which are involved in the digitisation process. This would allow a simple kind of a document history and gives also important information about the trustworthily of the whole document. If for example everything was verified by a service provider than we can asume that the quality of the document is very high. Storing the old values as well as the new ones would increase the filesize tremendously. Therefore we suggest to store only the information about what has been changed and by whom without keeping track of the changed values."
A post-processing actopm like new layout analysis (like outlined in #36 ) will cause too big changes to be able to track in such method. So the use-case for sich referencing might be quite limited in my point of view. But as you will loose original text information I would in repsonsible position for a long term-pres. storage not allow to overwrite these and anywhay keep a copy of the files. From those projects I made on national libraries I even heared that it is not allowed to adapt files in the repository at all and is always a new version placed. So for me the question remain, which additional information I get by this information and how I can use.
Finally on the other side it is simple extension, will only be for optional usage and does not cause a structural issue. I would just shorten to also prevent data issue (CORR= / VERIFIED=).
Continued in #39.
Champion: Clemens Neudecker
Submitter: Impact Submitted: 2013-02 Status: discussion
submitted - initial status when proposal is submitted
discussion - proposal is being discussed within the board
review - xsd code is being reviewed
accepted - proposal is accepted
rejected - proposal is rejected
draft - accepted proposal is in public commenting period
published - proposal is published in a schema version
Backwards compatible ?? To ALTO version ?
Purpose A lot of software tools and also human interactions are involved in different steps of the digitisation process. Each of them may affect an ALTO file by doing some refinements or corrections. From our point of view it would be desirable to keep track of the changes and verification done by the different agents which are involved in the digitisation process. This would allow a simple kind of a document history and gives also important information about the trustworthily of the whole document. If for example everything was verified by a service provider than we can asume that the quality of the document is very high. Storing the old values as well as the new ones would increase the filesize tremendously.
Correction and Validation are possible outcomes of the same process.
Implementation The ALTO schema already defines a element. The intention of this element is to record any details about those process steps that were carried out after the creation of the full text. The element is optional and not part of the actual page’s definition in ALTO.
In order to store information about the correction and verification process for individual text lines, words etc. the following elements are added to the section:
• stores the type of process step. It is a free text field, though IMPACT internal constraints require the element’s value to be set to “correction”.
• groups all elements regarding the result of the process. The element’s value attribute contains information about the outcome of the process. The element is repeatable. Each element represents a specific outcome of the process that is recorded in the element’s value attribute. This attribute may only contain two values: “corrected” or “verified”.
• is an element that wraps around all elements that were processed with the actual result as stated in the element’s value attribute.
• element contain the ID-value of an individual text line or word element. Unprocessed are not listed here.
If an element had not been processed, the element is not listed within .
Example:
Schema changes draft
Current schema Changed schema