DCLP Browse/search not working on Atlantides

hcayless commented 7 years ago

I'm trying to set up DCLP on the ISAW Atlantides server, with mixed success. The pages are getting created, and at least partially indexed, but browsing doesn't work at all. The Apache config points at pages that were removed a few weeks ago. See https://github.com/DCLP/navigator/commit/bfa0219173ab655c7d4c34b8643e14fe7926aabc, but https://github.com/DCLP/navigator/commit/3f8d55a1f048c066098044285bc93de718edf903. It looks like maybe https://github.com/DCLP/navigator/blob/master/pn-scripts/exist-update.sh is meant to generate those files and also do something with eXist? How and when is this supposed to be run?

paregorios commented 7 years ago

@m-k-r can you provide answers to the above questions in this ticket? We are blocked on papyri.info integration testing until we can get them answered. Thanks.

cc @rla2118 @rogerbagnall @jcowey @HolgerEssler @Edelweiss

hcayless commented 7 years ago

Think I figured it out. The script doesn't seem to work, but I was able to extract the Java command from it and run that to generate the files. Are we sure this is what we want for DCLP browsing?

jcowey commented 7 years ago

The drop down menu is what we have used. It is not perfect and if it could be made to work in the same way as ddbdp it would be better. The categories series; tm number; authors and works are categories that are definitely wanted. Does that help and is it clear?

paregorios commented 7 years ago

@rla2118 @rogerbagnall @HolgerEssler please advise as to whether you want the drop-down menu for browsing or would prefer something more like ddbdp. Please note that there may not be time/resources available to implement anything different from what is already in place on litpap.info, but I think we should try to capture the preference in any case.

m-k-r commented 7 years ago

I ran into this problem myself last week. This script assumes that saxon is exported in the classpath. But if it was exported webrick doesn't work. A possible solution would be to provide saxon and call it by absolute path.

The navigator works for dclp the same way as for ddbdp, hgv or apis. The dropdown list is just an additional way to sort and categorize.

m-k-r commented 7 years ago

saxon is already in sosol. Could this be used for the navigator or is it better to not have cross dependencies?

the exist-update script is called by pn-sync. This way the dclp part is separated from papyri and can be replaced if another solution is found, like moving the existdb database to solr.

If it is not documented, the application used in existdb can be found here.

Should I rewrite this Guide with our additions?

hcayless commented 7 years ago

Cross-dependencies aren't really the issue. SoSOL and the PN run in different containers, and pretty much have to because they have quite different resource usage profiles. The Editor is much more resource-intensive (because of JRuby its heap is basically a giant hairball of HashMaps), but the Navigator gets much more use and is much leaner, and we wouldn't want to have the Editor bring the whole thing down if (when) it starts running out of memory.

If this ends up being the way we do things, I'd move the XSLT transform part into pn-indexer, which already does a lot of XSLT (and has Saxon available), and call it directly within pn-sync without having to resort to the shell script. I don't like having the script overloaded like this. It's obscure.

It sounds like the plan is to leave the eXist setup at Heidelberg, so the script (minus the DCLP browse-building part) is probably fine. Maybe it could even just be triggered by a cron job.

paregorios commented 7 years ago

So what are the next steps here?

paregorios commented 7 years ago

per @rla2118 and @jcowey the preference is to have DDB/HGV/APIS-style browse rather than the drop-down if at all possible. With the caveat that browse by authors+works, TM, and editions is essential.

paregorios commented 7 years ago

Pinging @m-k-r and @hcayless RE my question above about what the tasks on this are now and also making reference to my most recent comment. Note priority upgrade.

m-k-r commented 7 years ago

The DDB/HGV/APIS-style browse is already working.

paregorios commented 7 years ago

Thanks @m-k-r . So I'm confused: what's all this about a drop-down and how it's different from the regular-style browse? And what are the other matters involved in this ticket (if any)?

@hcayless @m-k-r @jcowey can any of you help clarify?

m-k-r commented 7 years ago

The dropdown menu was originally a dirty fix when the regular browsing wasn't working yet. We kept the dropdown menu because the search picks up DCLP and DDBDP together and the dropdown menu only lists DCLP.

ryanfb commented 7 years ago

I think one next step would be to move the pn-scripts/generateCorpusOverview.xsl call out of the exist-update.sh script and into pn-indexer/src/info/papyri/indexer.clj (this may also require moving the XSL file itself into pn-xslt/).

Edit: Also, the top-level DCLP browse at http://litpap.info/browse/dclp/ seems to be fine without exist-update.sh/generateCorpusOverview.xsl being called, because it's generated by pn-dispatcher. However, DCLP TM, "by series", and authors+works are generated by that XSLT. Is "by series" (http://litpap.info/browse/dclp/series) also crucial? It seems to be the same as the hierarchical top-level view at http://litpap.info/browse/dclp/, just flattened. Maybe this is different because the end result of using the hierarchical view is a Solr search by series which doesn't differentiate between DCLP and DDbDP?

paregorios commented 7 years ago

@jcowey, @rla2118, and @rogerbagnall can you please comment on the observations and questions implied in @ryanfb's comment, immediately preceding? Thanks.

rogerbagnall commented 7 years ago

I don't have any comment on the technical questions at stake, but it's not obvious to me why the series view is essential. It may have value for someone else that I'm not aware of.

paregorios commented 7 years ago

@jcowey and I discussed this on Skype. I'll be providing more clarity shortly.

paregorios commented 7 years ago

I think there are two related issues being discussed here:

As @hcayless put it above: "Are we sure this is what we want for DCLP browsing?"
How and where should the corresponding page generation code be implemented and executed.

I think this warrants two separate tickets. Accordingly, I am now proclaiming the current ticket (#288) to be about "how and where" (i.e., the technical aspects). I have created a new ticket (#303) for the "what do we want" discussion. I am marking the technical ticket (this one) as "blocked" until we get #303 sorted.

paregorios commented 7 years ago

questions on #303 are now sorted so this issue is no longer blocked. Over to @hcayless to decide disposition of this ticket in light of what he's working on.

DCLP / dclpxsltbox

DCLP Browse/search not working on Atlantides #288