data-lessons / librarycarpentry

Materials for Library Carpentry development
22 stars 8 forks source link

XSLT Lesson #14

Closed drjwbaker closed 6 years ago

drjwbaker commented 8 years ago

Alan Danskin has contacted me with a list of use cases for XSLT that could form the basis for a lesson around transforming XML documents into other documents. The purpose of this issue is to share these use cases (with Alan's permission) and to open a discussion about whether this is something that we want to pursue, perhaps in partnership with Data Carpentry.

Use Case Description
Run a transformation Run a local transformation
Reuse a transformation Call a published transformation from an external source, e.g. MARCXML to MODs 3.6
Edit a transformation Amend a local transformation
Write a new transformation to map input format to output format For example MARC to MODS or TEI to EAD
Write a new transformation to edit input content For example normalisation of proprietary or non standard metadata
Use XSLT efficiently Use of regular expressions; x-path;
Process non-Roman scripts UNICODE conformance
Transformation is re-usable by others Conventions and commenting
Use XSLT to bulk transform legacy data For example, TEI records to IAMS compliant TEI (IAMS is BL implementation of EAD
jezcope commented 8 years ago

Other relevant XML formats that came up in an unrelated conversation today (about research data): Dublin Core, RDF, METS.

drjwbaker commented 8 years ago

Okay @jezcope. Potentially stupid questions, but XLST can handle transformations from/into these formats as well?

ostephens commented 8 years ago

XSLT can handle any XML->XML transform (at least in theory) It can also handle other type of transform - e.g. XML->CSV although this isn't the primary use

I'd probably recommend against using XSLT to create RDF, but I can't see any reason why this would be impossible to do

jezcope commented 8 years ago

Not a stupid question at all! The short answer is yes they can.

A longer version of the answer that may muddy the waters more than it clears them: All the metadata schemas mentioned above are not explicitly tied to XML (they're abstract specifications independent of any specific file format), but are commonly expressed in XML[^1]. XSLT can transform from any XML-based format to any other XML-based format. So it's a qualified yes, in that you need to be using the XML version of the schema in question.

[^1]: except for MARC, which is its own special flower and is generally stored in a binary form and edited in a rather byzantine text-based form O:)

ostephens commented 8 years ago

There is a cross over with web-scraping in that XPATH is often used as the method for extracting data from HTML pages when web scraping

drjwbaker commented 8 years ago

Ah yes, the 'XML-based format' thing is somewhere in the depths of my brain. Thanks for prodding it out! I know the BL use XSLT to transform RDFXML to CSV for their 'researcher' formats (see http://www.bl.uk/bibliographic/download.html). I'm sensing that there are a range of use cases here but that their broad applicability to our audiences - as we know them thus far - might be harder to nail down. Perhaps we need some kind of mechanism to use workshops to understand the fit of 'tools' we would develop lessons around to library world use cases?

Repositorian commented 8 years ago

Hey Library Carpentry colleagues,

Transforming marked up text documents from one format to another is on the lesson roadmap for Author Carpentry, as well. Our use cases would be transforming author's cv's, pubs lists, and even manuscripts from one mark up scheme to another. This sounds like an overlapping area we might collaborate on.

drjwbaker commented 7 years ago

@Repositorian Splendid. Sounds a plan. Have you seen http://programminghistorian.org/lessons/transforming-xml-with-xsl? Might be worth building on.