DCLP / dclpxsltbox

Sandbox for development, testing, and review of XSLT for DCLP
http://dclp.github.io/dclpxsltbox/
1 stars 5 forks source link

Schema Update #312

Closed Edelweiss closed 6 years ago

Edelweiss commented 7 years ago

Update all DCLP EpiDoc xml files to schema 8.22 (current version in SoSOL editor) or 8.23 (latest schema on http://www.stoa.org/epidoc/schema/).

This comes from dealing with #311.

We currently have

<?oxygen RNGSchema="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" type="xml"?>
=> 14554 instances
<?xml-model href="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
=> only one
<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
=> 64 cases
<?xml-model href="http://www.stoa.org/epidoc/schema/8.22/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
=> again only one
paregorios commented 7 years ago

I would prefer to see us take all files to http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng as discussed in #262, which I had closed because I thought this had been done comprehensively. Can we tell where the files came from that have the "xml-model" directive with the down-version URL in them? We can fix them in GitHub, but if they're coming in from an outside process/project (e.g., Würzburg) this problem could crop up again.

ryanfb commented 7 years ago

NB: The corresponding change should also be made in the SoSOL validator: https://github.com/DCLP/sosol/blob/master/lib/jruby_xml.rb#L144

Why latest instead of 8.23 (from #262)? I would caution against using latest, as any breaking change in EpiDoc will then no longer validate in the editor, which will disallow saving.

paregorios commented 7 years ago

I’m ok with 8.23. Whatever we do, it should be made universal across all the data.

-- Tom Elliott, Ph.D. Associate Director for Digital Programs and Senior Research Scholar Institute for the Study of the Ancient World (NYU) http://isaw.nyu.edu/people/staff/tom-elliott

Humanities Commons: @paregorios https://hcommons.org/members/paregorios/ OrcID: 0000-0002-4114-6677 http://orcid.org/0000-0002-4114-6677

On Wed, Aug 9, 2017 at 10:08 AM, Ryan Baumann notifications@github.com wrote:

NB: The corresponding change should also be made in the SoSOL validator: https://github.com/DCLP/sosol/blob/master/lib/jruby_xml.rb#L144

Why latest instead of 8.23 (from #262 https://github.com/DCLP/dclpxsltbox/issues/262)? I would caution against using latest, as any breaking change in EpiDoc will then no longer validate in the editor, which will disallow saving.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/DCLP/dclpxsltbox/issues/312#issuecomment-321266112, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQEdQIX-HVNY_h5dRy0NBn3HVJA0O8Gks5sWb1EgaJpZM4Oxx8E .

jcowey commented 7 years ago

We will now validate to 8.23.

Edelweiss commented 7 years ago

HGV

<?oxygen RNGSchema="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" type="xml" ?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>

Translations

<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>

APIS

<?xml-model href="http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>

DDB

<?xml-model href="http://www.stoa.org/epidoc/schema/8.16/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
Edelweiss commented 7 years ago

https://github.com/DCLP/sosol/commit/b22c45d5a951ac359739e94143d09abfa332c963 https://github.com/DCLP/idp.data/commit/9cd8dfea8dff1e1383197c1de85b634f369b93dd

jcowey commented 7 years ago

@paregorios with the two commits above the files in DCLP validate to the shema 8.23 e.g. <?xml-model href="http://www.stoa.org/epidoc/schema/8.23/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>

I would be very grateful if you could briefly check on this and then close the issue.

paregorios commented 7 years ago

I have checked. @jcowey or @Edelweiss someone needs to issue a pull request from the "development" branch on idp.data to the master branch so that the changes in DCLP/idp.data@9cd8dfe get picked up in production etc.

paregorios commented 7 years ago

Ditto for the SOSOL changes.

jcowey commented 7 years ago

@m-k-r here is the pull request https://github.com/DCLP/idp.data/pull/10

jcowey commented 7 years ago

@m-k-r would you please make sure that the sosol commit mentioned above https://github.com/DCLP/sosol/commit/b22c45d5a951ac359739e94143d09abfa332c963 is also merged to where it needs to be in master

ryanfb commented 7 years ago

A number of DCLP files are invalid when validated against 8.23: https://gist.github.com/ryanfb/231b37f0c8ef7910d04541082e95c6b4 (this was run against https://github.com/DCLP/idp.data/commit/9cd8dfe)

I've made a PR here to add the corresponding DCLP validation to Travis (though I can't enable Travis on the DCLP/idp.data repo as I don't have admin rights): https://github.com/DCLP/idp.data/pull/11

jcowey commented 7 years ago

https://github.com/DCLP/idp.data/blob/master/DCLP/62/61916.xml /DCLP/62/61916.xml:108:554: error: attribute "type" @type="pp" on bibl - as used in commentary and front matter (commentary and front matter were inherited from papyri.info) https://github.com/DCLP/idp.data/blob/master/DCLP/62/61916.xml validates against 8.16 (which is what is used in https://github.com/papyri/idp.data/tree/master/DDB_EpiDoc_XML)

<?xml-model href="http://www.stoa.org/epidoc/schema/8.16/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>

On changing "type" to "unit" in each case the file validates against 8.23

paregorios commented 7 years ago

So, what's the recommendation for what to do here?

Edelweiss commented 7 years ago

26 distinct files

idp.data/DCLP/62/61916.xml idp.data/DCLP/144/143319.xml idp.data/DCLP/27/26761.xml idp.data/DCLP/35/34810.xml idp.data/DCLP/64/63367.xml idp.data/DCLP/64/63383.xml idp.data/DCLP/64/63477.xml idp.data/DCLP/64/63504.xml idp.data/DCLP/64/63575.xml idp.data/DCLP/64/63711.xml idp.data/DCLP/60/59696.xml idp.data/DCLP/60/59752.xml idp.data/DCLP/64/63958.xml idp.data/DCLP/65/64041.xml idp.data/DCLP/65/64057.xml idp.data/DCLP/60/59962.xml idp.data/DCLP/65/64213.xml idp.data/DCLP/65/64223.xml idp.data/DCLP/65/64224.xml idp.data/DCLP/65/64353.xml idp.data/DCLP/65/64473.xml idp.data/DCLP/65/64801.xml idp.data/DCLP/65/64889.xml idp.data/DCLP/66/65004.xml idp.data/DCLP/66/65377.xml idp.data/DCLP/66/65613.xml

Edelweiss commented 7 years ago

error types

62/61916.xml: attribute "type" not allowed here 65/64223.xml: attribute "type" not allowed here 65/64473.xml: attribute "type" not allowed here 144/143319.xml: attribute "type" not allowed here 64/63383.xml: attribute "type" not allowed here

27/26761.xml: text not allowed here [ok] 35/34810.xml: text not allowed here [ok] 64/63367.xml: text not allowed here [ok] 64/63477.xml: text not allowed here [ok] 64/63504.xml: text not allowed here [ok] 64/63575.xml: text not allowed here [ok] 64/63711.xml: text not allowed here [ok] 64/63958.xml: text not allowed here [ok] 60/59696.xml: text not allowed here [ok] 60/59962.xml: text not allowed here [ok] 65/64213.xml: text not allowed here [ok] 65/64224.xml: text not allowed here [ok] 65/64353.xml: text not allowed here [ok] 65/64801.xml: text not allowed here [ok] 65/64889.xml: text not allowed here [ok] 65/64041.xml: text not allowed here [ok] 65/64057.xml: text not allowed here [ok] 66/65004.xml: text not allowed here [ok] 66/65377.xml: text not allowed here [ok] 66/65613.xml: text not allowed here [ok] → https://github.com/DCLP/idp.data/commit/f30b99e2de7d97e10c0a01c16a4c468e975f5d40

60/59752.xml: ID "GR562.02" has already been defined [ok] 60/59752.xml: first occurrence of ID "GR562.02" [ok] 60/59752.xml: ID "GR563.02" has already been defined [ok] 60/59752.xml: first occurrence of ID "GR563.02" [ok] → https://github.com/DCLP/idp.data/commit/63df5e64617c711557ec690b65b5f160dc1c9294

ryanfb commented 7 years ago

FWIW Travis CI should now be enabled for DCLP/idp.data with automated validation runs here: https://travis-ci.org/DCLP/idp.data

jcowey commented 7 years ago

We now have 7 instances in 5 files, see: https://docs.google.com/spreadsheets/d/1v7bMtAXDyzb2folOQ5rdyKzXJaDnxsIrvJGVwcwTDrM/edit#gid=0