altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

ALTO for Handwriting #56

Closed artunit closed 4 years ago

artunit commented 5 years ago

ALTO could have great value for handwriting representation. This is an initial example of what it might look like, I have taken the coordinates and confidence levels from the Cloud Vision API and its beta support for handwriting recognition, though have rounded the Glyph confidence numbers.

Handwriting Sample

<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns="http://www.loc.gov/standards/alto/ns-v4#" 
  xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd" 
  xmlns:xlink="http://www.w3.org/1999/xlink">
  <Tags>
    <RoleTag ID="HW01" TYPE="Handwritten"/>
  </Tags>
  <Layout>
    <Page WIDTH="1266" HEIGHT="107" PHYSICAL_IMG_NR="0" ID="page_0">
      <PrintSpace HPOS="0" VPOS="0" WIDTH="1266" HEIGHT="107">
        <TextBlock ID="block_0" HPOS="15" VPOS="16" WIDTH="1236" HEIGHT="81">
          <TextLine ID="line_0" TAGREFS="HW01" HPOS="15" VPOS="16" WIDTH="1236" HEIGHT="81">
            <String ID="string_0" HPOS="17" VPOS="28" WIDTH="97" HEIGHT="88" WC="0.98" CONTENT="This">
              <Glyph CONTENT="T" HPOS="17" VPOS="29" WIDTH="23" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="h" HPOS="42" VPOS="30" WIDTH="23" HEIGHT="80" GC="63.000000"/>
              <Glyph CONTENT="i" HPOS="68" VPOS="29" WIDTH="22" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="s" HPOS="92" VPOS="28" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_1" HPOS="137" VPOS="27" WIDTH="84" HEIGHT="81" WC="0.99" CONTENT="was">
              <Glyph CONTENT="w" HPOS="137" VPOS="27" WIDTH="29" HEIGHT="81" GC="99.000000"/>
              <Glyph CONTENT="a" HPOS="172" VPOS="26" WIDTH="26" HEIGHT="81" GC="100.000000"/>
              <Glyph CONTENT="s" HPOS="198" VPOS="26" WIDTH="23" HEIGHT="80" GC="100.000000"/>
            </String>
            <String ID="string_2" HPOS="254" VPOS="25" WIDTH="84" HEIGHT="81" WC="0.99" CONTENT="a"/>
            <String ID="string_3" HPOS="341" VPOS="20" WIDTH="167" HEIGHT="80" WC="0.99" CONTENT="pleasant">
              <Glyph CONTENT="p" HPOS="341" VPOS="23" WIDTH="16" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="l" HPOS="358" VPOS="23" WIDTH="16" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="e" HPOS="372" VPOS="23" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="a" HPOS="397" VPOS="22" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="s" HPOS="412" VPOS="22" WIDTH="22" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="a" HPOS="445" VPOS="21" WIDTH="22" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="n" HPOS="462" VPOS="21" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="t" HPOS="485" VPOS="20" WIDTH="23" HEIGHT="80" GC="100.000000"/>
            </String>
            <String ID="string_4" HPOS="531" VPOS="18" WIDTH="83" HEIGHT="81" WC="0.99" CONTENT="and">
              <Glyph CONTENT="a" HPOS="531" VPOS="18" WIDTH="29" HEIGHT="81" GC="99.000000"/>
              <Glyph CONTENT="n" HPOS="566" VPOS="19" WIDTH="25" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="d" HPOS="592" VPOS="18" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_5" HPOS="673" VPOS="13" WIDTH="212" HEIGHT="81" WC="0.99" CONTENT="reflective">
              <Glyph CONTENT="r" HPOS="673" VPOS="17" WIDTH="23" HEIGHT="81" GC="99.000000"/>
              <Glyph CONTENT="e" HPOS="698" VPOS="16" WIDTH="23" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="f" HPOS="725" VPOS="15" WIDTH="19" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="l" HPOS="741" VPOS="15" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="e" HPOS="766" VPOS="15" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="c" HPOS="782" VPOS="15" WIDTH="19" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="t" HPOS="807" VPOS="14" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="i" HPOS="825" VPOS="13" WIDTH="15" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="v" HPOS="839" VPOS="13" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="e" HPOS="862" VPOS="13" WIDTH="23" HEIGHT="80" GC="100.000000"/>
            </String>
            <String ID="string_6" HPOS="911" VPOS="9" WIDTH="146" HEIGHT="81" WC="0.99" CONTENT="journey">
              <Glyph CONTENT="j" HPOS="911" VPOS="12" WIDTH="23" HEIGHT="80" GC="95.000000"/>
              <Glyph CONTENT="o" HPOS="938" VPOS="11" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="u" HPOS="954" VPOS="10" WIDTH="19" HEIGHT="80" GC="96.000000"/>
              <Glyph CONTENT="r" HPOS="981" VPOS="11" WIDTH="16" HEIGHT="80" GC="70.000000"/>
              <Glyph CONTENT="n" HPOS="989" VPOS="10" WIDTH="16" HEIGHT="80" GC="97.000000"/>
              <Glyph CONTENT="e" HPOS="1011" VPOS="10" WIDTH="22" HEIGHT="80" GC="84.000000"/>
              <Glyph CONTENT="y" HPOS="1035" VPOS="9" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_7" HPOS="1101" VPOS="7" WIDTH="60" HEIGHT="80" WC="0.99" CONTENT="for">
              <Glyph CONTENT="f" HPOS="1101" VPOS="8" WIDTH="22" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="o" HPOS="1126" VPOS="7" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="r" HPOS="1145" VPOS="7" WIDTH="16" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_8" HPOS="1183" VPOS="6" WIDTH="46" HEIGHT="81" WC="0.99" CONTENT="me">
              <Glyph CONTENT="m" HPOS="1183" VPOS="7" WIDTH="22" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="e" HPOS="1207" VPOS="6" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_9" HPOS="1222" VPOS="5" WIDTH="25" HEIGHT="80" WC="0.97" CONTENT=","/>
          </TextLine>
        </TextBlock>
      </PrintSpace>
    </Page>
  </Layout>
</alto>
mittagessen commented 5 years ago

That is actually extremely pertinent to my work right now. For basic manuscripts with completely straight, vertical/horizontal writing ALTO works quite well but anything more complex would be helped by a free-form baseline capability. hOCR limits the definition to a polynomial but a sequence of line segments is more appropriate for highly curled/circular lines.

artunit commented 5 years ago

The shape-element usage discussion might be useful to you, I used the bounding box coordinates from the Cloud Vision API but ALTO has allowed polygon, circle and ellipse shape types since version 3.1, and these are available down to the glyph level.

mittagessen commented 5 years ago

Stupid question: Does the POLYGON shape define an open or a closed polygon? For baselines open would be more appropriate but the documentation doesn't elaborate on that point.

urieli commented 5 years ago

@mittagessen an "open polygon" is an oxymoron: a polygon is by definition "a closed plane figure bounded by three or more line segments." If what is meant is a series of points connected by line segments, maybe the name should be changed (not that I have an elegant suggestion).

mittagessen commented 5 years ago

@urieli Open polygonal chains are sometimes known as open polygons. The shortest unambiguous name would be polyline.

The easiest way would be to deal with this rather special case would be to extent the BASELINE attribute to allow polylines instead of a single line segment. It would also keep the existing semantics of the shape elements.

artunit commented 5 years ago

@urieli, @mittagessen - I like the_BASELINE_suggestion. Technically, the schema doesn't distinguish between open and closed polygons, though the documentation does identify its use for bounding shapes. Issue 22 targets changing BASELINE to PointsType which I think would address this.

mittagessen commented 5 years ago

@artunit Changing BASELINE to points type is exactly what I had in mind, although I am unsure if the change breaks backward compatibility unnecessarily. The old model just used a single y-coordinate, so the encoding differs even for perfectly straight baselines.

artunit commented 5 years ago

@mittagessen The schema does not currently annotate BASELINE and I guess it would come down to whether existing implementations would be broken. A point is normally two coordinates though there could be the notion that one is implicit for single values in the annotation. The schema also has the notion of a typesetting point, or 1/72 of an inch, so it would probably be good to define the different uses of point. In the same vein, PointsType is defined as a list of points and I think it would be useful to allow these to be written as a list of pairs, e.g. instead of 200 400 203 405 210 420, something like (200,400),(203,405),(210,420).

artunit commented 4 years ago

This issue seems to be addressed, ALTO is now used for encoding handwriting in two major projects (Transkribus and eScripta), and the change to BASELINE has been published in version 4.2 of the ALTO schema.