DCLP / dclpxsltbox

Sandbox for development, testing, and review of XSLT for DCLP
http://dclp.github.io/dclpxsltbox/
1 stars 5 forks source link

Display of HGV alongside DCLP metadata #270

Closed rla2118 closed 6 years ago

rla2118 commented 7 years ago

This issue concerns http://litpap.info/dclp/64347

DCLP data is not being shown alongside HGV data. Elsewhere it does (e.g., http://litpap.info/dclp/61247)

paregorios commented 7 years ago

This sounds to me like a problem with the mapping process. @hcayless would you concur? cc @jcowey @ryanfb

paregorios commented 7 years ago

ping @ryanfb @hcayless @wsalesky is this happening in the test environment as well?

jcowey commented 7 years ago

http://litpap.info/hgv/64347 and http://litpap.info/dclp/64347 are two different ways of getting at different metadata sources for the same TM number

wsalesky commented 7 years ago

@paregorios @jcowey The test environment is experiencing a similar issue. Not all of the related records are being processed. It is unclear to me why. @ryanfb or @hcayless any ideas?

ryanfb commented 7 years ago

Maybe this is happening because the processing/stylesheets associated with DDbDP in pn-indexer pick up and output the relationship, but the fallthrough processes/stylesheets for HGV don't? For example, http://litpap.info/dclp/61247 (which has the DCLP metadata) is from the generated HTML DDB_EpiDoc_XML/bgu/bgu.6/bgu.6.1470.html, while http://litpap.info/dclp/64347 (which doesn't have the DCLP metadata) is from the generated HTML HGV_meta_EpiDoc/HGV65/64347.html.

paregorios commented 7 years ago

A related issue for texts is #306

jcowey commented 7 years ago

For the sake of recording instances of the inconsistency of display of metadata where there are two sources of differing metadata for the one piece/text. http://litpap.info/dclp/75082 (O.Ont. Mus. 1 64) displays data from "HGV_meta_EpiDoc" - defaulting to https://github.com/DCLP/idp.data/blob/master/HGV_meta_EpiDoc/HGV76/75082.xml rather than https://github.com/DCLP/idp.data/blob/master/DCLP/76/75082.xml It does not display both

ryanfb commented 7 years ago

Here's the relevant explanation of the fall-through processing in pn-indexer: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L7-L11

And associated code: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L964-L981

I suspect that in all cases, as mentioned above, it's simply the HGV processing in pn-indexer not picking up or outputting the DCLP relation.

paregorios commented 7 years ago

@jcowey has asked me to pull together a list of TMnumbers for texts for which we have both DCLP files and HGV metadata files, which I can do programmatically.

paregorios commented 7 years ago

I think it is the case that when a particular papyrus is treated in HGV, DCLP, and DDBDP, then the information from all three collections is displayed. But when DDBDP does not treat the papyrus and only HGV and DCLP do, then HGV is the only thing displayed. This observation is consistent with what I think I understand from looking at the code pointed out by @ryanfb above.

paregorios commented 7 years ago

Here's a list of the files represented in both HGV and DCLP. Note that it has not been filtered for those that are also treated by DDBDP (see preceding comment for context).

pairs.txt

paregorios commented 7 years ago

I'm unclear where to go from here. I think we've ascertained that there's a deficiency in how HGV records are handled vis-a-vis potentially matching DCLP records, but I'm unclear whether that's an XSLT failing (in navigator/pn-xslt) or a pn-indexer failing or something else involving the numbers server. Can @hcayless or @ryanfb suggest where I might look next?

ryanfb commented 7 years ago

It looks like relations are built in pn-indexer (https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L570) by executing the SPARQL in relation-query against the Numbers Server: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L368

Here's relation-query for http://papyri.info/ddbdp/bgu;6;1470 (our example that has DCLP metadata correctly in the output): http://litpap.info/sparql?query=prefix%20dct:%20%3Chttp://purl.org/dc/terms/%3E%20select%20?a%20from%20%3Chttp://papyri.info/graph%3E%20where%20{%20%3Chttp://papyri.info/ddbdp/bgu;6;1470/source%3E%20dct:relation%20?a%20filter(!regex(str(?a),%27/images$%27))}

And here's relation-query for http://papyri.info/hgv/64347 (our example that has DCLP metadata incorrectly omitted in the output): http://litpap.info/sparql?query=prefix%20dct:%20%3Chttp://purl.org/dc/terms/%3E%20select%20?a%20from%20%3Chttp://papyri.info/graph%3E%20where%20%7B%20%3Chttp://papyri.info/hgv/64347/source%3E%20dct:relation%20?a%20filter(!regex(str(?a),%27/images$%27))%7D

Both seem to correctly output the DCLP relation. So the problem doesn't seem to be in the Numbers Server. It looks like these are passed as a related parameter to MakeHTML.xsl: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L591

So it looks like the problem is in the XSLT. Looking inside MakeHTML.xsl I see a call that outputs DCLP relations when the collection is DDB: https://github.com/DCLP/navigator/blob/master/pn-xslt/MakeHTML.xsl#L324

But I don't see that in the HGV collection template: https://github.com/DCLP/navigator/blob/master/pn-xslt/MakeHTML.xsl#L378

hcayless commented 7 years ago

Just looking at this in relation to getting browse working and @ryanfb's diagnosis looks spot on. If the DCLP doc is primary, the XSLT just ignores HGV and APIS. The Solr XSLT has the same problem (and I assume the text one as well). Fixing it.

hcayless commented 7 years ago

Should be fixed in DCLP/navigator@d785c8686f92fa53e31d60c0348044ba3094c115.

paregorios commented 7 years ago

@hcayless for some reason that commit hash didn't autolink. Was that a commit to DCLP master?

hcayless commented 7 years ago

@paregorios it was. I forgot y'all keep your issues in a separate repo.

m-k-r commented 7 years ago

The changes to pn-xslt/htm-teibibl.xsl cause during mapping the following error:

Processing Bibliography Error at xsl:variable on line 46 column 193 of htm-teibibl.xsl: XPST0017 XPath syntax error at char 0 on line 46 near {...tr/@target, '/source'), 'xm...}: Cannot find a matching 2-argument function named {http://papyri.info/ns}get-docs() Error at xsl:for-each on line 89 column 189 of htm-teibibl.xsl: XPST0017 XPath syntax error at char 0 on line 89 near {...get, '/source'), 'xml')/t:b...}: Cannot find a matching 2-argument function named {http://papyri.info/ns}get-docs() Error at xsl:variable on line 243 column 105 of htm-teibibl.xsl: XPST0017 XPath syntax error at char 14 on line 243 near {...cat($link, '/source'), 'xml...}: Cannot find a matching 2-argument function named {http://papyri.info/ns}get-filename() Exception in thread "main" java.lang.RuntimeException: javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 3 errors detected. at clojure.lang.Util.runtimeException(Util.java:165) at clojure.lang.Compiler.eval(Compiler.java:6476) at clojure.lang.Compiler.eval(Compiler.java:6455) at clojure.lang.Compiler.load(Compiler.java:6902) at clojure.lang.Compiler.loadFile(Compiler.java:6863) at clojure.main$load_script.invoke(main.clj:282) at clojure.main$init_opt.invoke(main.clj:287) at clojure.main$initialize.invoke(main.clj:315) at clojure.main$null_opt.invoke(main.clj:348) at clojure.main$main.doInvoke(main.clj:426) at clojure.lang.RestFn.invoke(RestFn.java:421) at clojure.lang.Var.invoke(Var.java:405) at clojure.lang.AFn.applyToHelper(AFn.java:163) at clojure.lang.Var.applyTo(Var.java:518) at clojure.main.main(main.java:37) Caused by: javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 3 errors detected. at net.sf.saxon.PreparedStylesheet.prepare(PreparedStylesheet.java:220) at net.sf.saxon.PreparedStylesheet.compile(PreparedStylesheet.java:106) at info.papyri.map$init_xslt$fn__41.invoke(map.clj:118) at clojure.lang.AFn.call(AFn.java:18) at clojure.lang.LockingTransaction.run(LockingTransaction.java:263) at clojure.lang.LockingTransaction.runInTransaction(LockingTransaction.java:231) at info.papyri.map$init_xslt.invoke(map.clj:118) at info.papyri.map$load_map.invoke(map.clj:386) at info.papyri.map$_mapAll.invoke(map.clj:445) at info.papyri.map$_main.doInvoke(map.clj:468) at clojure.lang.RestFn.invoke(RestFn.java:408) at clojure.lang.Var.invoke(Var.java:401) at user$eval5.invoke(form-init3265076432784098882.clj:1) at clojure.lang.Compiler.eval(Compiler.java:6465) ... 13 more

hcayless commented 7 years ago

@m-k-r: Should be fixed in DCLP/navigator@572f8d32880dc967813900cb9b602609ebb1f405.

m-k-r commented 7 years ago

This error is fixed now. But now it ends with

16:02:53 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:53 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:53 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:02:55 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:55 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:55 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:02:58 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:58 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:58 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:02:59 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:59 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:59 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:01 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:01 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:01 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:03 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:03 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:03 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:06 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:06 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:06 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:07 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:07 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:07 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:10 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:10 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:10 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:11 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:11 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:11 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:13 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:13 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:13 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena.

and then after 1 or 2 hours in a java heap error.

The version before this commit is working.

Reverting "" doesn't change anything.

Both before and after updating the schema in idp.data works with the old version of the navigator.

hcayless commented 7 years ago

Those warnings don't mean anything. I've got another set of updates queued up to push once I've finished testing that may help, but you may also have to give the process more memory or see what else is running on that machine that might be crowding it. FWIW it runs successfully in a few minutes on my laptop.

hcayless commented 7 years ago

Latest set of commits should help with overall correctness, but if it's taking that long and running out of memory, that suggests an under-resourced environment rather than a bug to me.

ryanfb commented 7 years ago

You should be able to increase the memory for Fuseki with e.g. JVM_ARGS=-Xmx4096M (or higher) in your environment before running fuseki-server.

m-k-r commented 7 years ago

@ryanfb that worked.

indexing was without problems. http://litpap.info/dclp/64347

These warnings should show where in the process the problem occurred. With these changes does fuseki generally need more resources or only during the mapping? Until now the default 1200mb were sufficent. How much ram should we consider for fuseki? 4gb is half of the ram for the virtual machine.

Edelweiss commented 6 years ago

fixed navigator side of things, SoSOL editor still pending