Esri / geoportal-server

Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
https://gptogc.esri.com/geoportal
Apache License 2.0
244 stars 149 forks source link

Geoportal Facets app and DCAT in 1.2.9 #313

Open cybersea opened 5 years ago

cybersea commented 5 years ago

I have deployed the geoportal facets application (solr v.4.1.0) to index a geoportal v.1.2.9 database. It is not indexing all the records, and it appears to be missing new records harvested from a DCAT source. I followed the instructions from the wiki here: https://github.com/Esri/geoportal-server/wiki/Geoportal-Facets-using-Apache-Solr

Any suggestions for debugging this issue? Is there a configuration that I'm missing?

mhogeweg commented 5 years ago

hi, I recommend switching to Geoportal Server 2.x. This is a new 'generation' of Geoportal Server, based on elastic and implementing configurable faceted search as its starting point. You can find the application in its own GitHub repository.

cybersea commented 5 years ago

Thanks @mhogeweg

We would like to switch, but we have a front-end application that is dependent upon the v.1.2.x architecture including the (optional) solr index. At this time we do not have the resources to rewrite this front-end app, so I need to try to make the latest 1.2.x version work for now. Hopefully we can migrate the app in the future to your new architecture.

Any assistance with debugging this issue or pointers is greatly appreciated.

mhogeweg commented 5 years ago

is your site public by chance?

cybersea commented 5 years ago

The front-end production website is here: http://portal.westcoastoceans.org/discover/

If you mean the upgraded geoportal 1.2.9 site with solr that I'm trying to debug, I'm experimenting with that on our dev server and it's not yet hooked up to the front-end. http://207.141.116.172/geoportal129/catalog/main/home.page http://207.141.116.172/gc129/ (this Geoportal-Solr webpage is not working for some reason, so I am viewing the solr index from our Tomcat Manager app, but that requires a login).

mhogeweg commented 5 years ago

I checked out the gc129 site and can see filters and apply them:

image

mhogeweg commented 5 years ago

what seems to break is the link to the xml. for example for the first entry on the page above the links are:

the XML link points to 127.0.0.1, which would be my machine.

also the link in the solrjson response url.metadata_s points to 127.0.0.1:8080.

I suggest checking the configuration and see what gpt.instance.url is set to.

I see this page has about 1600 items, while the vanilla gpt site has some 2100. did you follow step 7 and have the GcService web app deployed?

cybersea commented 5 years ago

Thanks @mhogeweg. Glad to see gc129 site is working on your end -- I'll try to do some more debugging on this end.

I copied the configuration from our existing site, and noticed that they are set up to point to local host, which I assumed was intended. I'm not too worried about the links not working since we are not using that aspect, but I could change it to the main (dev) URL: http://207.141.116.172

The discrepancy that you see between the Geoportal-Solr page and the vanilla gpt site is what I'm trying to debug. That difference is equal to the number of records that were pulled from the DCAT source: http://geo.wa.gov/data.json (WA Geospatial Open Data Portal)

I followed step 7 and deployed a new gc service web app to go with this geoportal instance, and named it gc129 (instead of GcService). And, it is successfully working as of yesterday and as it spun up I could see the count of indexed files increasing until it hit 1585.

mhogeweg commented 5 years ago

This app was created before we harvested DCAT. the app takes metadata and applies an xslt transformation. That transformation did not include support for DCAT as a structure. I'm making some updates and will share shortly.

mhogeweg commented 5 years ago

attached are two xslt that should replace the corresponding files in the folder: ...\GcService\WEB-INF\classes\gc-config\xmltypes

These transformations take the metadata in the geoportal server index and prepare them for solr. The DCAT items were not indexed as the xslt did not know how to deal with the format yet.

xmltypes.zip

please check with these and see if the DCAT items do get indexed (may require tomcat stop/start)

cybersea commented 5 years ago

Thank you very much @mhogeweg! This is a big help to us.

I have installed the new config files and restarted tomcat, but haven't seen a change in the indexed files. Is there a way to manually do this -- I know it is scheduled via one of the config files to run in the middle of the night.

cybersea commented 5 years ago

I did not detect any changes between the dc-toSolr.xslt you provided in the zip file, and the one from the existing repo. Should there be changes in that file? or just the dc-base-toSolr.xslt

mhogeweg commented 5 years ago

it is just the dc-base one. the other one imports this one, so you may keep the existing one. I included it as they 'go together'. I'll check on forcing solr to reindex the content.

cybersea commented 5 years ago

I stopped tomcat, deleted all the files from the solr index (data folder), restarted tomcat and watched the solr index repopulate from 0 records and stop at 1585 again. So, unfortunately, this .xslt file change does not appear to be working for me.

cybersea commented 4 years ago

@mhogeweg I added your xslt files to my Geoportal Facets for DCAT and solr is still not indexing the DCAT entries. http://207.141.116.172/geoportal129/catalog/main/home.page (2129 results) http://207.141.116.172/gc129/ (1585 results)

Any suggestions?

mhogeweg commented 4 years ago

I'm going to look into this a bit more. I harvested the geoportal129 site into our geoportal 2 sandbox: http://geoss.esri.com/geoportal2/#. If you open the 'source of origin' facet, you'll see your ip address listed with 1477 documents. My harvested indicated that 651 docs failed to publish (total 2128 retrieved).

I'll try to understand why so many failed (likely a validation issue).

Do you see any errors in your solr logs?

cybersea commented 4 years ago

Thanks @mhogeweg.

There are no solr errors in the tomcat (Catalina) logs. Is there another set of logs I should check?

We have customized or site a bit as far as validation to loosen it up a bit, so maybe that's the reason for validation failing on your side. (?)