Closed rla2118 closed 6 years ago
This sounds to me like a problem with the mapping process. @hcayless would you concur? cc @jcowey @ryanfb
ping @ryanfb @hcayless @wsalesky is this happening in the test environment as well?
http://litpap.info/hgv/64347 and http://litpap.info/dclp/64347 are two different ways of getting at different metadata sources for the same TM number
@paregorios @jcowey The test environment is experiencing a similar issue. Not all of the related records are being processed. It is unclear to me why. @ryanfb or @hcayless any ideas?
Maybe this is happening because the processing/stylesheets associated with DDbDP in pn-indexer
pick up and output the relationship, but the fallthrough processes/stylesheets for HGV don't? For example, http://litpap.info/dclp/61247 (which has the DCLP metadata) is from the generated HTML DDB_EpiDoc_XML/bgu/bgu.6/bgu.6.1470.html
, while http://litpap.info/dclp/64347 (which doesn't have the DCLP metadata) is from the generated HTML HGV_meta_EpiDoc/HGV65/64347.html
.
A related issue for texts is #306
For the sake of recording instances of the inconsistency of display of metadata where there are two sources of differing metadata for the one piece/text. http://litpap.info/dclp/75082 (O.Ont. Mus. 1 64) displays data from "HGV_meta_EpiDoc" - defaulting to https://github.com/DCLP/idp.data/blob/master/HGV_meta_EpiDoc/HGV76/75082.xml rather than https://github.com/DCLP/idp.data/blob/master/DCLP/76/75082.xml It does not display both
Here's the relevant explanation of the fall-through processing in pn-indexer
: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L7-L11
And associated code: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L964-L981
I suspect that in all cases, as mentioned above, it's simply the HGV processing in pn-indexer
not picking up or outputting the DCLP relation.
@jcowey has asked me to pull together a list of TMnumbers for texts for which we have both DCLP files and HGV metadata files, which I can do programmatically.
I think it is the case that when a particular papyrus is treated in HGV, DCLP, and DDBDP, then the information from all three collections is displayed. But when DDBDP does not treat the papyrus and only HGV and DCLP do, then HGV is the only thing displayed. This observation is consistent with what I think I understand from looking at the code pointed out by @ryanfb above.
Here's a list of the files represented in both HGV and DCLP. Note that it has not been filtered for those that are also treated by DDBDP (see preceding comment for context).
I'm unclear where to go from here. I think we've ascertained that there's a deficiency in how HGV records are handled vis-a-vis potentially matching DCLP records, but I'm unclear whether that's an XSLT failing (in navigator/pn-xslt) or a pn-indexer failing or something else involving the numbers server. Can @hcayless or @ryanfb suggest where I might look next?
It looks like relations are built in pn-indexer
(https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L570) by executing the SPARQL in relation-query
against the Numbers Server: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L368
Here's relation-query
for http://papyri.info/ddbdp/bgu;6;1470 (our example that has DCLP metadata correctly in the output):
http://litpap.info/sparql?query=prefix%20dct:%20%3Chttp://purl.org/dc/terms/%3E%20select%20?a%20from%20%3Chttp://papyri.info/graph%3E%20where%20{%20%3Chttp://papyri.info/ddbdp/bgu;6;1470/source%3E%20dct:relation%20?a%20filter(!regex(str(?a),%27/images$%27))}
And here's relation-query
for http://papyri.info/hgv/64347 (our example that has DCLP metadata incorrectly omitted in the output):
http://litpap.info/sparql?query=prefix%20dct:%20%3Chttp://purl.org/dc/terms/%3E%20select%20?a%20from%20%3Chttp://papyri.info/graph%3E%20where%20%7B%20%3Chttp://papyri.info/hgv/64347/source%3E%20dct:relation%20?a%20filter(!regex(str(?a),%27/images$%27))%7D
Both seem to correctly output the DCLP relation. So the problem doesn't seem to be in the Numbers Server. It looks like these are passed as a related
parameter to MakeHTML.xsl
: https://github.com/DCLP/navigator/blob/master/pn-indexer/src/info/papyri/indexer.clj#L591
So it looks like the problem is in the XSLT. Looking inside MakeHTML.xsl
I see a call that outputs DCLP relations when the collection is DDB: https://github.com/DCLP/navigator/blob/master/pn-xslt/MakeHTML.xsl#L324
But I don't see that in the HGV collection template: https://github.com/DCLP/navigator/blob/master/pn-xslt/MakeHTML.xsl#L378
Just looking at this in relation to getting browse working and @ryanfb's diagnosis looks spot on. If the DCLP doc is primary, the XSLT just ignores HGV and APIS. The Solr XSLT has the same problem (and I assume the text one as well). Fixing it.
Should be fixed in DCLP/navigator@d785c8686f92fa53e31d60c0348044ba3094c115.
@hcayless for some reason that commit hash didn't autolink. Was that a commit to DCLP master?
@paregorios it was. I forgot y'all keep your issues in a separate repo.
The changes to pn-xslt/htm-teibibl.xsl cause during mapping the following error:
Processing Bibliography Error at xsl:variable on line 46 column 193 of htm-teibibl.xsl: XPST0017 XPath syntax error at char 0 on line 46 near {...tr/@target, '/source'), 'xm...}: Cannot find a matching 2-argument function named {http://papyri.info/ns}get-docs() Error at xsl:for-each on line 89 column 189 of htm-teibibl.xsl: XPST0017 XPath syntax error at char 0 on line 89 near {...get, '/source'), 'xml')/t:b...}: Cannot find a matching 2-argument function named {http://papyri.info/ns}get-docs() Error at xsl:variable on line 243 column 105 of htm-teibibl.xsl: XPST0017 XPath syntax error at char 14 on line 243 near {...cat($link, '/source'), 'xml...}: Cannot find a matching 2-argument function named {http://papyri.info/ns}get-filename() Exception in thread "main" java.lang.RuntimeException: javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 3 errors detected. at clojure.lang.Util.runtimeException(Util.java:165) at clojure.lang.Compiler.eval(Compiler.java:6476) at clojure.lang.Compiler.eval(Compiler.java:6455) at clojure.lang.Compiler.load(Compiler.java:6902) at clojure.lang.Compiler.loadFile(Compiler.java:6863) at clojure.main$load_script.invoke(main.clj:282) at clojure.main$init_opt.invoke(main.clj:287) at clojure.main$initialize.invoke(main.clj:315) at clojure.main$null_opt.invoke(main.clj:348) at clojure.main$main.doInvoke(main.clj:426) at clojure.lang.RestFn.invoke(RestFn.java:421) at clojure.lang.Var.invoke(Var.java:405) at clojure.lang.AFn.applyToHelper(AFn.java:163) at clojure.lang.Var.applyTo(Var.java:518) at clojure.main.main(main.java:37) Caused by: javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 3 errors detected. at net.sf.saxon.PreparedStylesheet.prepare(PreparedStylesheet.java:220) at net.sf.saxon.PreparedStylesheet.compile(PreparedStylesheet.java:106) at info.papyri.map$init_xslt$fn__41.invoke(map.clj:118) at clojure.lang.AFn.call(AFn.java:18) at clojure.lang.LockingTransaction.run(LockingTransaction.java:263) at clojure.lang.LockingTransaction.runInTransaction(LockingTransaction.java:231) at info.papyri.map$init_xslt.invoke(map.clj:118) at info.papyri.map$load_map.invoke(map.clj:386) at info.papyri.map$_mapAll.invoke(map.clj:445) at info.papyri.map$_main.doInvoke(map.clj:468) at clojure.lang.RestFn.invoke(RestFn.java:408) at clojure.lang.Var.invoke(Var.java:401) at user$eval5.invoke(form-init3265076432784098882.clj:1) at clojure.lang.Compiler.eval(Compiler.java:6465) ... 13 more
@m-k-r: Should be fixed in DCLP/navigator@572f8d32880dc967813900cb9b602609ebb1f405.
This error is fixed now. But now it ends with
16:02:53 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:53 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:53 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:02:55 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:55 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:55 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:02:58 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:58 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:58 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:02:59 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:02:59 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:02:59 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:01 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:01 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:01 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:03 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:03 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:03 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:06 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:06 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:06 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:07 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:07 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:07 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:10 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:10 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:10 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:11 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:11 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:11 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena. 16:03:13 WARN BaseXMLWriter :: Namespace prefix 'j.1' is reserved by Jena. 16:03:13 WARN BaseXMLWriter :: Namespace prefix 'j.3' is reserved by Jena. 16:03:13 WARN BaseXMLWriter :: Namespace prefix 'j.2' is reserved by Jena.
and then after 1 or 2 hours in a java heap error.
The version before this commit is working.
Reverting "
Both before and after updating the schema in idp.data works with the old version of the navigator.
Those warnings don't mean anything. I've got another set of updates queued up to push once I've finished testing that may help, but you may also have to give the process more memory or see what else is running on that machine that might be crowding it. FWIW it runs successfully in a few minutes on my laptop.
Latest set of commits should help with overall correctness, but if it's taking that long and running out of memory, that suggests an under-resourced environment rather than a bug to me.
You should be able to increase the memory for Fuseki with e.g. JVM_ARGS=-Xmx4096M
(or higher) in your environment before running fuseki-server
.
@ryanfb that worked.
indexing was without problems. http://litpap.info/dclp/64347
These warnings should show where in the process the problem occurred. With these changes does fuseki generally need more resources or only during the mapping? Until now the default 1200mb were sufficent. How much ram should we consider for fuseki? 4gb is half of the ram for the virtual machine.
fixed navigator side of things, SoSOL editor still pending
This issue concerns http://litpap.info/dclp/64347
DCLP data is not being shown alongside HGV data. Elsewhere it does (e.g., http://litpap.info/dclp/61247)