altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

Recommendation for link to ALTO in iiif manifest #40

Open cneud opened 8 years ago

cneud commented 8 years ago

The iiif defines a Presentation API that allows the representation of - where available - OCR results in ALTO as annotations, linked by a manifest.

Example:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "application/alto+xml", 
profile: "http://www.loc.gov/standards/alto/",
label: "ALTO"\
}

It would be good to have a recommendation from the ALTO board on the values for two fields, format and label. The format should resemble a MIME-type, e.g. application/xml or text/xml, while the later can be a simple text like "ALTO XML", "ALTO OCR" or similar.

cneud commented 8 years ago

Also, should the profile refer to the XSD, namespace or other?

kba commented 8 years ago

Since version information can be important for data consumers, a reference that indicates the version would make sense for the profile. If there are no breaking changes between minor versions with regards to how OCR text is expressed in ALTO, the namespace would suffice.

Jo-CCS commented 8 years ago

First of all I appreciate the initiative of iif and glad alto is considered on this api as one standard format. Due to the case that ALTO is not containing appliciation specific information than containing text content, the format should be "text/xml". This is according to what was has been used on MIMETYPE attribute in METS on existing METS profiles and as done on the Europeana newspaper project. I agree regarding the "profile" to statement of "kba". Regarding the "label" I suppose this is only used for display purpose and spacing is no issue on this.

So I would recommend as followed for an ALTO file of version 3:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "text/xml", 
profile: "http://www.loc.gov/standards/alto/v3",
label: "ALTO XML"\
}
cneud commented 8 years ago

I wonder whether it might be worth considering the registration of a MIME type "application/alto+xml", similar to what RFC6207 specifies for METS/MODS/MADS/MARC21/SRU.

cneud commented 8 years ago

@Jo-CCS Yes, "label" is a free text field and only used for orientation.

Jo-CCS commented 8 years ago

Yes, also a registration of MIME type "application/alto+xml" makes sense to me.

altomator commented 7 years ago

"application/alto+xml" sounds great to me. IIIF documentation has already some samples with "application/tei+xml"

cneud commented 7 years ago

Note that "application/tei+xml" also has RFC6129 supporting it. We should therefore check whether "application/alto+xml" can be included in an update to RFC6207 and how, or whether a new RFC must be prepared (by whom?)

altomator commented 7 years ago

To register alto+xml, we need to write a RFC and submit it to iana.org. -> tei+xml : https://tools.ietf.org/html/rfc6129

My BnF colleagues argue that it's not mandatory. Eg: application/warc isn't declared at IANA but it's an ISO standard. -> http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717

cneud commented 7 years ago

Certainly one can also live without the RFC, but note that due to this, WARC is also not currently considered a registered MIME-type, cf. https://kris-sigur.blogspot.de/2016/05/warc-mime-type.html "if we wish to have this standardized then going through this process is the only option"

altomator commented 7 years ago

RFC draft: https://docs.google.com/document/d/1Bu9BWDlgdj_ALk1Z7uNY5bX93y5LqEbwvm0RC0w0kvc/edit?usp=sharing