MyCoRe-Org / libmeta

Java APIs and models for common library standards
GNU Lesser General Public License v3.0
0 stars 2 forks source link

Support for ALTO 3 and ALTO 4 #61

Open datazuul opened 6 months ago

datazuul commented 6 months ago

Currently Alto 2.1 is supported from libmeta-alto.

Provide support for Alto 3: https://www.loc.gov/standards/alto/v3/alto.xsd and Alto 4: https://www.loc.gov/standards/alto/v4/alto.xsd

rsteph-de commented 1 month ago

Alto v 4.4 is now part of the development branch and we've provided first Junit test. @datazuul - Can you test this "in production"?

I tend to skip v3.0 because it is more or less compatible to v4.4 and v4.0 was released in January 2018 (6 years ago).

I plan a new release this week, because I am going to present the libraries as a poster on OR 2024.

datazuul commented 2 weeks ago

At least I was able to test it successfully.

I was able to read TextLine-contents to create a plaintext-representation of alto.xml.

A method with unmarshalling an xml without checking namespace would be great... something like:

boolean namespaceAware = false;
Alto4XMLProcessor.getInstance().unmarshal(altoXmlPath, namespaceAware);

Using feature of DocumentBuilderFactory:

JAXBContext jc = JAXBContext.newInstance( "com.acme.foo" );
Unmarshaller u = jc.createUnmarshaller();

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File( "nosferatu.xml"));

Object o = u.unmarshal( doc );

Not sure if this is possible?