altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

Option to enrich ALTO content description #83

Open cipriandinu opened 1 year ago

cipriandinu commented 1 year ago

There are printed materials and even handwritten materials that can't be properly described by ALTO for example complex mathematical/chemical formulas or musical notes. There are specialized XML formats for these cases (MathML, MusicML, etc). Maybe we can add a new blocktype "CustomBlock" that will not contain textlines but custom content definition. The idea is to embed these custom definitions as metadata records are embedded in METS for example. Some possible samples:

<CustomBlock ID=... XPOS=... ...>
   <cbWrap MIMETYPE="text/xml" cbType="MathML">
       <xmlData>
                <math xmlns="http://www.w3.org/1998/Math/MathML">
                    <mi>&#x03C0;<!-- π --></mi>
                    <mo>&#x2062;<!-- &InvisibleTimes; --></mo>
                    <msup>
                      <mi>r</mi>
                      <mn>2</mn>
                    </msup>
                </math>
       </xmlData>
   </cbWrap>
</CustomBlock>

<CustomBlock ID=... XPOS=... ...>
   <cbWrap MIMETYPE="text/xml" cbType="MusicXML">
       <xmlData>
                <score-partwise version="4.0">
                  <part-list>
                     <score-part id="P1">
                       <part-name>Music</part-name>
                     </score-part>
                   </part-list>
                   <part id="P1">
                     <measure number="1">
                        <attributes>
                           <divisions>1</divisions>
                           <key>
                              <fifths>0</fifths>
                           </key>
                           <time>
                             <beats>4</beats>
                             <beat-type>4</beat-type>
                           </time>
                           <clef>
                              <sign>G</sign>
                              <line>2</line>
                           </clef>
                         </attributes>
                         <note>
                            <pitch>
                               <step>C</step>
                               <octave>4</octave>
                            </pitch>
                            <duration>4</duration>
                            <type>whole</type>
                         </note>
                      </measure>
                   </part>
             </score-partwise>
       </xmlData>
   </cbWrap>
</CustomBlock>

Alternatively we my extract this from ALTO and use a mechanims based on external files (again inspired by METS, that reffers several external files like ALTO, images, etc via filegrp/file)

cipriandinu commented 1 year ago

... looks like my samples were also processed.... I will try again to keep real xml code

cipriandinu commented 1 year ago

samples.txt