Open cneud opened 8 years ago
Also, should the profile refer to the XSD, namespace or other?
Since version information can be important for data consumers, a reference that indicates the version would make sense for the profile
. If there are no breaking changes between minor versions with regards to how OCR text is expressed in ALTO, the namespace would suffice.
First of all I appreciate the initiative of iif and glad alto is considered on this api as one standard format. Due to the case that ALTO is not containing appliciation specific information than containing text content, the format should be "text/xml". This is according to what was has been used on MIMETYPE attribute in METS on existing METS profiles and as done on the Europeana newspaper project. I agree regarding the "profile" to statement of "kba". Regarding the "label" I suppose this is only used for display purpose and spacing is no issue on this.
So I would recommend as followed for an ALTO file of version 3:
seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "text/xml",
profile: "http://www.loc.gov/standards/alto/v3",
label: "ALTO XML"\
}
I wonder whether it might be worth considering the registration of a MIME type "application/alto+xml", similar to what RFC6207 specifies for METS/MODS/MADS/MARC21/SRU.
@Jo-CCS Yes, "label" is a free text field and only used for orientation.
Yes, also a registration of MIME type "application/alto+xml" makes sense to me.
"application/alto+xml" sounds great to me. IIIF documentation has already some samples with "application/tei+xml"
To register alto+xml, we need to write a RFC and submit it to iana.org. -> tei+xml : https://tools.ietf.org/html/rfc6129
My BnF colleagues argue that it's not mandatory. Eg: application/warc isn't declared at IANA but it's an ISO standard. -> http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717
Certainly one can also live without the RFC, but note that due to this, WARC is also not currently considered a registered MIME-type, cf. https://kris-sigur.blogspot.de/2016/05/warc-mime-type.html "if we wish to have this standardized then going through this process is the only option"
The iiif defines a Presentation API that allows the representation of - where available - OCR results in ALTO as annotations, linked by a manifest.
Example:
It would be good to have a recommendation from the ALTO board on the values for two fields, format and label. The format should resemble a MIME-type, e.g. application/xml or text/xml, while the later can be a simple text like "ALTO XML", "ALTO OCR" or similar.