Closed randalldfloyd closed 4 years ago
Was discovered while solving #60
I'd be happy to suggest updated xpaths, if you'd like to post fields.xsl here (or the xpaths from it)
@Conal-Tuohy and @mdalmau I just created a PR that adds the aforementioned XSL file. I like the idea of that being part of the repo because its dependency in the entire workflow isn't transparent and isn't tracked anywhere else. I created PR #75 based on a branch that you can checkout and push changes to if you want to have a look. You can see from my discussion above which fields come up empty, and comparing that to the xpaths should give you some idea of what the intention was supposed to be.
I've updated the XSLT in beef81b7978a388ee9775e5396f8f4357afa8948
There were just two changes:
xpath-default-namespace
, so that unprefixed element names in XPath expressions in the stylesheet are taken to refer to elements in that namespace (this avoids having to tediously prefix each element name with tei:
in XPath expressions).msDescription
element's name has changed to msDesc
, for some reason, so I changed that in the XPath expressions too. One oddity which remains empty in the stylesheet's output is the <dc:date>
element. The original XPath was expecting a date
element contained within the title
of the titleStmt
, but I couldn't find a P4 file which had such a date
, and the P5 versions don't either. I've left that XPath as it is (it would work, if there were data there, since it at least now uses the P5 namespace).
I also added a test harness (2abe044cd2abb759261ac5c44b2a75792b404dc5) to the web app, to be able to check the output of the stylesheet conveniently (without having to submit a text to Xubmit), and I've deployed this to the dev server. See the right-hand-column in the table http://carbon.dlib.indiana.edu:8220/p5/
I think my work here is done! ;-)
OMG @Conal-Tuohy this is brilliant. Thank you! @randalldfloyd do you want me to replace the updated fields.xsl file or do you? I can't remember if we have SVN for Xubmit code? It's been so long.
@Conal-Tuohy once we get the updated fields.xsl replaced in Xubmit, could you do that final conversion and push into Xubmit for the P5 files? I will need to coordinate a freeze time with the encoders. Let me know when you think this is feasible and I can get last minute p4 updates in place along with the freeze. I then, your work is really done. Thank you again for helping with the XSL which was not in scope. I appreciate it.
@Conal-Tuohy This is just super! Big +1 on the test for the XSL. @mdalmau I'm not sure what the setup procedure looked like for Xubmit repos but I would imagine it was based around SVN. We should undo that if we have the time. For now the easiest route is to copy that file down in whatever way is easiest for you and just overwrite the one that's there. I'll think about how to make that dependency more transparently linked to the Github repo.
The old overwrite is fine by me! I just didn't want F stuff up. Will do that later this AM. And then check the test file currently in the repo.
@mdalmau I just copied the new version in - I was already there thinking out how to link it to Github so I just copied it in. I will test to see if it works.
@mdalmau New version of field.xsl
works perfectly! Do you want to test your own source document before proceeding? The one I've been testing with is probably a bit older that what's being indexed in the webapp.
Sounds like you're all good to go! Great!
@mdalmau > could you do that final conversion and push into Xubmit for the P5 files?
To be honest I'd really prefer to leave that for someone at Indiana, to avoid having to coordinate a "freeze" halfway around the world and also because I've not really dealt with Xubmit, so far.
It's straightforward, though; when you are ready to make the transition, you need to:
Use the admin UI on carbon, firstly to download the finalised P4 files from Xubmit, and secondly to convert them to P5. Note that the 2 buttons for performing these steps are "greyed out" to indicate the imminent deprecation of the P4 Xubmit repo, and the P4-to-P5 converter. They do still work though!
When you do that, the result will be an updated corpus of P5 docs which you can find in /srv/services/chymistry-devel-tomcat-8220/newton_chymistry/p5
on carbon's file system, or alternatively you can download them individually from the page http://carbon.dlib.indiana.edu:8220/p5/ (the "view converted TEI P5 file" column). Then just upload those files to the new Xubmit repo.
You could make a staggered transition, too; you could do this now, and if there were a few "straggler" P4 files which people were still working on, you could come back later, when they were all done, and run through the same process, except (obviously) you'd only upload into Xubmit the P5 versions of those specific straggler documents.
Incidentally, I just tried to refresh the P4 on carbon and got an error, which is due to the web-app's failure to handle a document which contains a space in its name in Xubmit: ALCH00040 (3)
. The app is concatenating the name to create a URL which is invalid because it contains a space. This is I guess a bug in my interface with Xubmit (which I basically reverse-engineered/guessed). Rather than debug the problem, maybe it would be easier to rename the document in Xubmit? I am guessing the (3)
on the end is probably not intended anyway?
We need to update the schema location in the P5 output to point to for Xubmit to take the files: xsi:schemaLocation="http://www.tei-c.org/ns/1.0 http://dlib.indiana.edu/lib/xml/newton/tei_all.xsd"
I can do that as an interim step after I grab the P5 that Con generates before uploading to Xubmit unless it's something that should be part of @Conal-Tuohy's XSLT?
@mdalmau I couldn't tell if you were asking me this or not. I think that if we are going to rely on the external schema location, then any XSLT involved should probably be responsible for that so it's obvious and transparent, vs a manual step that would have to be documented. The XSLT would make it fail proof.
I add this schema reference into the stylesheet that generates the P5 docs. See f20a959f42419d29883e0f56eab481964a657a85
I tested this locally and it looked good. I haven't run it on carbon, though (or even done the git pull
).
Thank you, @Conal-Tuohy. I was planning on running the transformation today at 1 pm EST (so in 2.5 hours). Would I be able to do this through the admin screen? I have one P4 XML file that I will need to manually transform with your XSLT b/c Xubmit is no longer accepting the P4 version (valid). Xubmit was always fragile, but I think it's ready to retire. :-)
I missed Con's initial report about the ALCH00040 (3) error so I encountered it as I began the conversion process. I am unable to delete this file from the Xubmit interface (get an error). I deleted the file from the server, but it's still in Xubmit (which makes sense b/c it's a versioning system). I have asked @randalldfloyd for help. I can't even start the "one-click" transformation process because of this error. Sigh.
Nick deleted the file from the Exist database so I should be able to run the conversions using the admin tool. @randalldfloyd, you are off the hook!
@Conal-Tuohy I ran the P4 to P5, ran another quick transformation to add the schema pointer for Xubmit and have added all but one xml file to the Xubmit p5 repo. I'd close this issue, but there are bits of code that haven't been updated in production though it may not matter since that's the P4 to P5 bits that will go away from the admin screen.
During submission to Xubmit, a XSL transformation is performed to extract values that will be used to store/track the document. The transformation is collection specific and stored on the server like:
/opt/xubmit/repositories/newtonchym/xslt/field.xsl
. This XSL file is then referenced inside of the configuration (in eXist) per collection.For the
newtonchym
collection in Xubmit, the transform is only producing field values in the resulting document for fields that were sent values via parameters. The rest are supposed to be derived from the source document but are not (probably xpath issues.) This can be seen with a manual transformation: