TEIC / Roma-Antiqua

This repository houses the code for Roma Antiqua, the web based TEI software for generating customisations.
https://romaantiqua.tei-c.org
20 stars 7 forks source link

When using Roma with P5 3.0.0, att.pointing.attribute.target defines @target to be empty #12

Closed lddubeau closed 6 years ago

lddubeau commented 8 years ago

Version

The Roma web site reports being at version 4.18.

It uses P5 3.0.0 to generate the schema.

Steps to reproduce the issue

  1. Go to the web site for Roma.
  2. Click the option "Reduce: create a new customization by removing elements and modules from the largest possible schema" and then click the "Start" button.
  3. You'll be taken to a new page, but click the "Schema" tab to be taken to the schema page.
  4. Download the compact Relax NG schema by clicking on "Generate". Save the schema somewhere.

    Actual Results

The saved schema contains this:

att.pointing.attribute.target =

  ## specifies the destination of the reference by supplying one or more URI References
  attribute target {
    list { empty }
  }?

Expected Results

The definition should be:

att.pointing.attribute.target =

  ## specifies the destination of the reference by supplying one or more URI References
  attribute target {
    list { data.pointer+ }
  }?

Observations

The end result of the issue above is that @target on tags like ptr cannot contain anything anymore. I discovered the issue when I regenerated a few hours ago with the command line Roma a customization I've been using for years without problem. All of a sudden all my ptr elements turned up invalid.

I don't know of a way to force the web Roma to use an older version of P5 but if I force my command line Roma to use version 2.9.1 of P5, and run in on the customization described above after saving it locally, then I get the expected results. And if I run it with P5 3.0.0, I can reproduce the bad results.

martindholmes commented 8 years ago

Just a point of clarification: Roma is simply handing off ODD-to-schema conversion to OxGarage, so I suspect the problem is with OxGarage.

jamescummings commented 8 years ago

Hi @lddubeau,

This was one of the things we worried about in the 3.0.0 release. But I don't think we do a good enough job of testing Roma/OxGarage before a guidelines and stylesheets release is made. There were significant changes in the stylesheets to deal with the move to Pure ODD and I suspect that this is probably a result of that (but don't know off the top of my head). As @martindholmes has just said while I'm typing this, Roma just acts as an ODD editor (feel like writing a replacement? ... it can be done in just javascript ;-) ) and passes off the conversion of files to OxGarage ... something else that Sebastian was primarily responsible for understanding. This just uses the underlying stylesheets so I suspect that it isn't OxGarage or Roma itself, but the Stylesheets. Specifically the ODD to RelaxNG ones.

lb42 commented 8 years ago

If you compile your ODD to a schema using oXygen you will find everything works correctly. So I think this is not a stylesheet problem, but rather something in the Roma interface which is not picking up the correct Stylesheets release.

jamescummings commented 8 years ago

Ah, so then I'd agree with @martindholmes that it is likely an OxGarage problem since that is where the conversion is done.

martindholmes commented 8 years ago

During the release I raised the question of how OxGarage gets its stylesheets updated, and I still don't know the answer to that. If OxGarage is relying on the debian packages, then its stylesheets are several versions behind, and the move to PureODD with P5 3.0.0 would be expected to break OxGarage's functionality.

lddubeau commented 8 years ago

The website uses OxGarage but as I mentioned in the initial report (in "Observations") I'm able to reproduce the issue with the command line Roma. Is the command line also using OxGarage? (If the answer is yes, then that's news to me.)

lb42 commented 8 years ago

The command line roma is also provided by the debian packages, so it is also out of date I assume. It should have an option for specifying a different location for the stylesheet library however: have you tried downloading and installing the latest Stylesheet release, and then specifying that on the command line?

martindholmes commented 8 years ago

Quote from the OxGarage readme:

"make sure the TEI stylesheets and source are installed at /usr/share/xml/tei using the Debian file hierarchy standard; the distribution file at https://sourceforge.net/projects/tei/files/Stylesheets/ is in the right layout."

This suggests that both P5 and the Stylesheets should be installed on the OxGarage server in the normal places where the Debian packages would put them. This means that Roma is broken until someone can manually update those packages. The packages are built, of course, but they're not signed and released; a sysadmin could I think install unsigned packages if they needed to. This is fairly urgent, I think. Another option would be to try setting up a second OxGarage VM somewhere (perhaps Peter could do it) and making sure it's up-to-date and working, then changing the pointers so that Roma uses it.

lddubeau commented 8 years ago

@lb42 Yes, it is out of date. I've built a .deb for the latest Roma (4.18) and installed that, then I downloaded P5 3.0.0 and version 7.41.0 of the stylesheets and ran:

$ roma --localsource=/home/ldd/tmp/no-backup/tei-3.0.0/xml/tei/odd/p5subset.xml --xsl=/home/ldd/tmp/no-backup/tei-xsl-7.41.0/xml/tei/stylesheet tei_all.xml

And this combination produced this definition:

att.pointing.attribute.target =

  ## specifies the destination of the reference by supplying one or more URI References
  attribute target {
    list { xsd:anyURI+ }
  }?

which is equivalent to what I was expecting (the data.pointer identifier in the "expected results" is defined elsewhere as xsd:anyURI).

The problem is that by default roma will grab the latest P5 from the TEI web site, and will try to process that even if it cannot process it because the XSL it is working with cannot do it. I had set my build system to pass --localsource=... because I find the default behavior problematic but a later change to the build system caused the flag to no longer be passed to roma, which reverted to its default behavior, and I ran into the issue described into the original post.

I've temporarily worked around the issue by copying /usr/bin/roma to a directory inside my project, and editing it to be able to pass a specific version number to download from the TEI site. (--localsource seems to be suffering from a bug in the 7.39.0 version of the XSL that has been fixed in 7.41.0, so I've not pursued this avenue.)

martindholmes commented 8 years ago

The command-line Roma has been deprecated for a long time, in favour of using the "teitornc" and similar commands provided in the Stylesheets bin directory. If you run one of those commands in the bin directory of a fresh copy of the Stylesheets, instead of roma, do you get expected results?

lddubeau commented 8 years ago

@martindholmes So that I know where to look in the future, where should I have looked to learn that roma is deprecated? I do try to keep abreast of such developments, and usually I do a double check before posting an issue. For instance, I took a look at the ChangeLog but that was unhelpful. I also checked the P5 release notes going back to 2.0.0 but did not see something that would indicate roma is deprecated.

Now, teitornc behaves like roma. If I just run it like this:

$ teitornc tei_all.xml

I get the bad definition in my original report, but then again that's using XSL 7.39.0 and having teitornc silently download P5 3.0.0 from the TEI site. (That it is downloading P5 from the web can be ascertained by running with --verbose.)

This gives the same good results as running the roma equivalent that uses the same version of P5 and of the XSL:

$ teitornc --localsource=/home/ldd/tmp/no-backup/tei-3.0.0/xml/tei/odd/p5subset.xml --apphome=/home/ldd/tmp/no-backup/tei-xsl-7.41.0/xml/tei/stylesheet tei_all.xml tei_all_3.0.0_7.41.0.rnc

Ultimately, whenever a combination of P5 and XSL is good with roma, then it is good with teitornc, whenever a combination is bad with roma, then it is bad with teitornc.

martindholmes commented 8 years ago

So that I know where to look in the future, where should I have looked to learn that roma is deprecated?

I don't know the answer to that! I was happily using it until Sebastian told me rather forcefully one day a few years ago that I shouldn't be.

Ultimately, if we can keep the Debian packages current and also make sure they're kept updated on OxGarage, most of these problems will disappear.

jamescummings commented 8 years ago

It isn't so much that commandline roma was deprecated as replaced by the ant driven scripts in the Stylesheets/bin/ folder. These are much more efficient because of our dependence on a variety of java packages. They also give some additional options that may not have been in roma... I can't remember.

martindholmes commented 8 years ago

Those "scripts" are actually all aliases to a single script, which determines its input parameters by parsing the name of the alias that called it. Sebastian was mighty clever. :-)

jamescummings commented 8 years ago

Hi @lddubeau

Could you (Or others!) go try again on Web Roma and see if it is still having the same problem or if it is fixed. I believe I've put the latest Stylesheets in the correct place in the TEI Vault and made them current. I don't know if this fixes the problem in Roma/OxGarage or not.

-James

lb42 commented 8 years ago

You seem to have introduced a new bug. Attempting to process almost any ODD file (I tried 3) via web roma to produce a Relaxng schema now generates an error message "/tmp/a2c199475306b000c312ac0e5b6bbd30.tmp:2:6: fatal: The processing instruction target matching "[xX][mM][lL]" is not allowed. "

lb42 commented 8 years ago

More helpfully perhaps, if you try to generate a RNG version of any ODD you get the following error message from the EGE library:

<error msg="Base URI {} is not an absolute URI" exclass="class pl.psnc.dl.ege.ex
ception.ConverterException" >pl.psnc.dl.ege.exception.ConverterException: Base U
RI {} is not an absolute URI
    at pl.psnc.dl.ege.MultiXslConverter.convert(MultiXslConverter.java:108)
    at pl.psnc.dl.ege.component.NamedConverter.convert(NamedConverter.java:4
4)
    at pl.psnc.dl.ege.ConversionPerformer.run(ConversionPerformer.java:44)
    at java.lang.Thread.run(Thread.java:619)
</error>
jamescummings commented 8 years ago

Interesting. Do you get this is you try the same (on the command line) with the stylesheets on the master branch? All I did was copy them into place. :-(

martindholmes commented 8 years ago

I've seen that bug before, but I can't for the life of me remember when or where it showed up. This is relevant:

http://stackoverflow.com/questions/19889132/error-the-processing-instruction-target-matching-xxmmll-is-not-allowed

and it suggests that somewhere there is unwanted whitespace at the top of an XML file (perhaps the ODD you're processing). Is that the case?

[No, as I say below, this is caused by the format of the error message produced by the EGE processor; it has nothing to do with the ODD file, which is fine when processed by e.g. teitorelaxng LB]

lb42 commented 8 years ago

James's comment above is weird: I (and presumably everyone else though it might be helpful if someone else could confirm) am seeing this error when I use roma on the TEI's website . There's no way I can "try the same" with different stylesheets. Hypothesizing wildly, I suggest that the java error messages mean that some file is inaccessible or unavailable perhaps because a stale file handle is being used. Which means it might be fixed by a restart of tomcat as Martin suggests. What is really daft is that there must be ways of associating the EGE process with particular releases of the Glines/Stylesheets but these are clearly not being used.

lb42 commented 8 years ago

Martin, the complaint about the double XML declaration is caused by the fact that the output from the EGE engine contains an error message (as shown above), which is for some reason preceded by two XML declarations. Try producing a RelaxNG XML syntax schema, or a DTD, and that's what you get. The message I quoted above is because jing tries to parse the error message file when you produce RNC instead of just throwing it at you.

jamescummings commented 8 years ago

I'm sorry you think my comment is weird, it was hastily written on my phone, I could have gone into more detail. All I meant is that do you get any errors when you use the same stylesheets (e.g. master branch) to do teitorelaxng on the command line. i.e. Testing whether this OxGarage/Roma or a problem with the stylesheets. I know it probably isn't, I just wanted to make sure and wasn't in a position to do so myself. I agree it seems a good idea to restart tomcat. I'll ask the TEI-C Sysadmin to do so.

lb42 commented 8 years ago

No, as already noted on this thread, command line processing using the right stylesheets release works fine. Or we wouldn't have been able to make this release would we?

jamescummings commented 8 years ago

Sure... unless someone added something to Master afterwards or something. Sysadmin emailed.

jamescummings commented 8 years ago

Restart didn't seem to solve the problem. In an emergency if people ask who don't need TEI P5 3.0.0 they can use http://tei.it.ox.ac.uk/Roma/ which calls our Oxford OxGarage. This uses the TEI P5 2.8.0 release and stylesheets from that date. So has the following in output:

tei_att.pointing.attribute.target =

  ## specifies the destination of the reference by supplying one or more URI References
  attribute target {
    list { tei_data.pointer+ }
  }?

But of course none of the changes from 2.9.1 or 3.0.0. I'll try to have a look at the tei-c.org server today to see if I can see where it is failing but am not even sure where it is on the server.

lb42 commented 8 years ago

We should perhaps ask Sysadmin to temporarily redirect Roma users to the Oxford site if there's no obvious solution by end of business today. It's surely better to have a functioning Roma without the latest releases than a dead one.

jamescummings commented 8 years ago

Just adding these notes on this issue in case they are useful:

Finding last time this base-uri problem was reported: http://permalink.gmane.org/gmane.text.tei.general/18961 The fix that Sebastian implemented immediately after that dateTime was in the odds/odd2odd.xsl https://github.com/TEIC/Stylesheets/commit/bb2b1ca1cbc7d0fdbd85f7cd8a53845b09e710d8 and seems to try to prevent base-uri being empty. I don't know why that wouldn't still be working.

jamescummings commented 8 years ago

I've temporarily redirected the 'current' symlink back to 7.39.0 release of the stylesheets and left a 'current-broken' symlink there pointing to the 7.41.0 release. This restores the state of play to a working Roma and OxGarage but with the problem that @lddubeau notes that opened this issue. I've done this under the assumption that this is better than a completely non-functional Roma/OxGarage, but let me know if I'm incorrect in that assumption.

I've tested running the conversions on the commandline for both versions on the tei-c.org server and they seemed to work as expected i.e. 7.39.0 had the problem, 7.41.0 had no problem and worked. The compiled odd had a base-uri so I really don't know what oxgarage is complaining about.

martindholmes commented 8 years ago

One place to look might be p5subset.xml, by diffing e.g.

http://www.tei-c.org/Vault/P5/2.8.0/xml/tei/odd/p5subset.xml

against

http://www.tei-c.org/Vault/P5/3.0.0/xml/tei/odd/p5subset.xml

to see if anything obvious jumps out.

jamescummings commented 8 years ago

Well the datatypes have been renamed from data.pointer http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-data.pointer.html (which we still seem to be providing even though nothing uses it) to teidata.pointer http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teidata.pointer.html maybe something has some name of datatype hard coded somewhere. But the stylesheets work on the commandline so I can't see why that would be a problem.

lb42 commented 8 years ago

The difference between 2.8.0 and 3.0.0 is mostly that (Relaxng) references to data.foo have been changed to (ODD dataRef) references to teidata.foo. The problem originally reported suggests that dataRef elements are not being processed correctly which in turn suggests that the processor is using the wrong stylesheets version. We keep the data.foo definitions so that an ODD which references them directly won't break.

jamescummings commented 8 years ago

@lb42: That fits with my understanding. I've put an issue in https://github.com/TEIC/TEI/issues/1445 but only because we should have a discussion about whether these should be properly deprecated or not.

On the oxgarage/stylesheets problem, to summarise some of what I know: Because we hadn't made a stylesheets release they were being processed with the 7.39.0 release of the stylesheets which didn't handle all the pureodd stuff. If we point the 'current' symlink at the 7.41.0 stylesheets it all goes wrong with that base-uri() error. However, when I run those stylesheets on the commandline it all seems to work fine. And indeed, as you say, we know this to be true because if we go look at: att.pointing.attribute.target in http://www.tei-c.org/Vault/P5/3.0.0/xml/tei/custom/schema/relaxng/tei_all.rnc It has list { xsd:anyURI+ }. All good.
I know that OxGarage looks at the Stylesheets in 'current', since it goes back to this previous behaviour when I change it. I know that the Stylesheets are fine when I run them on the commandline. So it must be something about the way OxGarage is running them? But that is precisely where I have a big whole in my knowledge.

Back in May 2015 Sebastian said "I can confirm that a change to the underlying ODD-processing stylesheets is causing the issue. It only shows up in the web context, which is why the regular tests didn’t catch it". So this makes me think it is indeed a stylesheets problem with poviding the base-uri when run in the oxgarage context because it doesn't seem to be a problem on the commandline. :-(

Any more suggestions?

lb42 commented 8 years ago

I am looking suspiciously at line 221 in the odd2odd stylesheet which reads

  <xsl:value-of select="resolve-uri(string-join(($currentDirectory, $loc), '/'),base-uri($top))"/>

$top has the value "/" which is maybe not the right thing to be using on a webserver? (whistling in the dark)

jamescummings commented 8 years ago

I think you may be correct.

If I log into the tei-c.org server, and edit the 7.41.0 stylesheets odd2odd.xsl, adding in a hard coded source pointing at the released p5subset.xml at http://www.tei-c.org/release/xml/tei/odd/p5subset.xml then it works fine.

To be clear what I did was:

 <xsl:variable name="source">
      <xsl:choose>
        <xsl:when test="starts-with($loc,'file:')">
          <xsl:value-of select="$loc"/>
        </xsl:when>
        <xsl:when test="starts-with($loc,'http:')">
          <xsl:value-of select="$loc"/>
        </xsl:when>
        <xsl:when test="starts-with($loc,'https:')">
          <xsl:value-of select="$loc"/>
        </xsl:when>
<!-- JCC adding this otherwise here to test -->
<xsl:otherwise>http://www.tei-c.org/release/xml/tei/odd/p5subset.xml</xsl:otherwise>
<!-- JCC mucking about on live server!
        <xsl:when test="starts-with($loc,'/')">
          <xsl:value-of select="resolve-uri($loc)"/>
        </xsl:when>
        <xsl:when test="starts-with($loc,'tei:')">
          <xsl:value-of
              select="replace($loc,'tei:',$defaultTEIServer)"/>
          <xsl:text>/xml/tei/odd/p5subset.xml</xsl:text>
        </xsl:when>
        <xsl:when test="base-uri($top)=''">
          <xsl:value-of select="$currentDirectory"/>
          <xsl:value-of select="$loc"/>
        </xsl:when>
        <xsl:when test="$currentDirectory=''">
          <xsl:value-of select="resolve-uri($loc,base-uri($top))"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="resolve-uri(string-join(($currentDirectory, $loc), '/'),base-uri($top))"/>
        </xsl:otherwise>
-->
      </xsl:choose>
    </xsl:variable>

Which I probably shouldn't leave that way. And I've not changed it here on github or anywhere else. (If I do, I suspect this butchery would result in carnage for anyone trying to use it elsewhere.) But this seems to confirm what you suggest, that it is the $source variable not being set right.

I'm assuming what this is doing is coping with the fact that in addition to file/http/https URIs that we get ODDs that have relative URIs or none etc. I've no idea what just commenting this out might break....

I'll leave it commented out as such right now, since this seems to solve the problem on the live site and I doubt anyone is downloading the individual xsl files rather than going to github. Maybe you, @martindholmes @hcayless or others might have some thoughts on what this should be in the long run and we can make the change properly on github and retrofit if necessary. (I'm prob. not going to be able to look at it more in the next couple days.)

lb42 commented 8 years ago

I confirm that the web roma is now behaving itself again. Thanks James! (But why did you put the explicit <otherwise> before all the other <when>s ?)

martindholmes commented 8 years ago

I wonder if what Sebastian did last time was exactly what you did too: hack the released code on the server. But your fix may work fine, so I'd suggest committing it to dev and seeing if anything breaks.

jamescummings commented 8 years ago

@lb42 thanks, but only because of your hint on where the problem was!

@martindholmes No, Sebastian didn't hack the released code ... we can check that since it is all in the vault. If you diff http://www.tei-c.org/Vault/Stylesheets/7.39.0/xml/tei/stylesheet/odds/odd2odd.xsl and http://www.tei-c.org/Vault/Stylesheets/7.41.0/xml/tei/stylesheet/odds/odd2odd.xsl the additions all seem to be about Pure ODD. I'm assuming there are reasons when the $loc might start with '/', 'tei:' or empty (or the current directory is empty) that he was trapping for a reason. I'm just saying we might want to rewrite this to catch whatever situation is actually happening and do something correct with it. I've just failed to find out what is actually happenning to produce the error. ;-)

lddubeau commented 8 years ago

@jamescummings While the modification you made is a step towards finding a solution, I do not think it is the right one to commit to the repo. Here's why.

  1. For consistency with how the tools already work, the URL should be http://www.tei-c.org/Vault/P5/current/xml/tei/odd/p5subset.xml.

    When teitornc or roma are run without options, they download the specs from the TEI site, the URL they use to get the specs is http://www.tei-c.org/Vault/P5/current/xml/tei/odd/p5subset.xml.

    It would be possible to download older versions by replacing current with a version number. I don't think there's an analogue for the http://www.tei-c.org/release/ URL. I mean, if someone who has been using http://www.tei-c.org/release/ wants to get an older version instead, they have to go into the Vault/P5 hiearchy. They cannot just change part of the path with a version number.

    And if it ends up that a modification must be made to add a xsl:when branch that produces a URL to the TEI site, again for consistency, it seems to me it should really be built from defaultTEIServer and defaultTEIVersion like is done in the initialization of DEFAULTSOURCE.

  2. It seems to me there is something else that is preventing source from getting a good value, and that's what should be investigated.

    As I mentioned above, the teitornc and roma tools will download p5subset.xml from the TEI site by default. When no options are passed to the command line tools, they run odd2odd.xsl with the XSL parameter defaultSource left to its default value, which is the empty string, causing the XSL variable DEFAULTSOURCE to be set to http://www.tei-c.org/Vault/P5/current/xml/tei/odd/p5subset.xml. And in the function tei:workOutSource, the variable loc also gets to be set to this URL, and so does the source variable.

    If odd2odd.xsl is run in such a way that source has some other value, then there is something that causes it to be set to that other value. I'd look for what this cause is and alter that rather than alter the code of odd2odd.xsl.

    I don't know all the cases that the section you commented out are meant to handle but the first one in what you commented out is meant to handle the case where people do something like

     teitornc --localsource=/home/ldd/something/or/other [...]

    With version 7.39.0 (and presumably earlier versions) a path that starts with a forward slash like in the example above fails to work. With version 7.40.1 and 7.41.0, it works. So if you were to comment out the first xsl:when that you commented out, it would constitute a regression.

jamescummings commented 8 years ago

@lddubeau I agree with you.

1) Ah, that was just careless cut-and-paste on my part. /release/ is a symlink to /Vault/P5/current/ ... so I agree if putting this fix into the repo then it should be the Vault version.

2) That was precisely the situation I was worrying about. Of course we could decide that all localsources had to be given using a file:// URI syntax.

Yes, the current commenting out is a regression, I'm sure of that. When no one is looking I was going to start by adding back each xsl:when to see which was causing the problem to see if I can track down where oxgarage is having a problem and see if we can come to a compromise. But well, seemed better to try and hack a fix for now on the website. Happy to receive any proposals or patches of course if you or anyone else wants to do this! I'm going to put a release blocking issue in the TEI repo with the 3.1.0 milestone which references this issue so that we know we have to solve it before next release.

lddubeau commented 8 years ago

@jamescummings I'll have to sit on the sidelines for this one. Whenever I've been able to do so, I've provided solutions in one form or another in my issue and bug reports and feature request submitted to the TEI repositories (most of them are on Sourceforge, from back when that was the correct site to report such things) but web-based Roma is a part of the code-base I've never touched before and it uses technologies I've not worked with for years. So it would be a major enterprise for me to tackle this.

jamescummings commented 8 years ago

@lddubeau Don't worry, was just asking on the off-chance! Your bug reports and feature requests have always been well-documented and appreciated. One of these days you should run for TEI Council.

raffazizzi commented 8 years ago

So the problem seems to be that whatever library runs XSLTs in OxGarage gets confused in resolving a URI like /usr/share/xml/tei/odd/p5subset.xml because it's not an absolute URI (it's not). Stand-alone Saxon must be more forgiving.

So specifying the base as file:/// seems to fix the problem: https://github.com/TEIC/Stylesheets/commit/f3044b9e37520fd034b27dc792f948a8cb172538

This seems to work both with Saxon run from oXygen and a local instance of OxGarage. It could be worth trying this on the server before merging the bug-fix branch to the Stylesheets' dev branch.

peterstadler commented 6 years ago

I believe, this issue can be closed?

lddubeau commented 6 years ago

I cannot tell whether there's still an issue to be resolved or not, and I'm currently busy with activities that don't involve using TEI so I don't have the time to look again at this problem. Consequently, I don't object to closing this issue.

peterstadler commented 6 years ago

ok, thanks, closing the issue. If the issue resurfaces, please feel free to reopen.