cneud / alto-tools

Python tools for performing various operations on ALTO XML files
Apache License 2.0
39 stars 15 forks source link

Alto namespaces not sufficient #21

Closed LegoUnicorn closed 1 year ago

LegoUnicorn commented 1 year ago

Available Alto namespaces not sufficient for Alto documents from the National Library of Scotland Digital Foundry.

Needs addition of 'alto-v3-alt' : 'http://www.loc.gov/standards/alto/v3/alto.xsd' to namespaces.

cneud commented 1 year ago

Hi @LegoUnicorn and sorry for the late reply! I can add that variation, but would first like get some more feedback.

It is a bit uncommon that the National Library of Scotland Digital Foundry ALTO file points directly to the .xsd as the namespace declaration. This may be related to https://github.com/altoxml/schema/issues/67 though. Can you share any information on how the ALTO files where produced? I downloaded the trial data for the "The Spiritualist" newspaper from the National Library of Scotland Digital Foundry and looking at the metadata provided it only says pdfalto,0.1 - could this possibly be pdfalto?

And would 'alto-v3-xsd' : 'http://www.loc.gov/standards/alto/v3/alto.xsd' also work for you instead of alto-v3-alt?