Obs not displaying on portal

kkoch commented 6 years ago

Noticed at DMAC meeting that portal is not displaying obs. Kelly checked and obs cache was empty. Greg and Cheryl verified that this was due to partition on Michigan being at 100% capacity. They cleared off some files and got it back to 85% and obs was now showing data (verified by looking at data on dev).

HOWEVER, @kknee @gcutrell @cheryldmorse I waited awhile since I figured our normal latency might delay things and still have not seen the buoys/stations showing up on the production portal as of 6pm (even though as mentioned above they are showing on dev). Does something need to be rebooted or do we have another issue?

kknee commented 6 years ago

@Bobfrat can you take a look? This is all I see for obs in the portal:

Bobfrat commented 6 years ago

@kknee @kkoch We're getting an error retrieving the csw records from geonetwork using python owslib. Has anything changed recently with the geonetwork instance? I'll try to dig in from my end.

  File "/Users/bobfratantonio/Documents/Dev/virtenvs/data-catalog/lib/python2.7/site-packages/owslib/csw.py", line 399, in getrecords2
    self._parserecords(outputschema, esn)
  File "/Users/bobfratantonio/Documents/Dev/virtenvs/data-catalog/lib/python2.7/site-packages/owslib/csw.py", line 549, in _parserecords
    self.records[identifier] = MD_Metadata(i)
  File "/Users/bobfratantonio/Documents/Dev/virtenvs/data-catalog/lib/python2.7/site-packages/owslib/iso.py", line 146, in __init__
    self.contentinfo.append(MD_FeatureCatalogueDescription(contentinfo))
  File "/Users/bobfratantonio/Documents/Dev/virtenvs/data-catalog/lib/python2.7/site-packages/owslib/iso.py", line 961, in __init__
    val = i.attrib['uuidref']
  File "src/lxml/lxml.etree.pyx", line 2467, in lxml.etree._Attrib.__getitem__ (src/lxml/lxml.etree.c:70664)
KeyError: 'uuidref'

Bobfrat commented 6 years ago

Ok after some investigation it appears that this error has been occurring since 2017-12-21, which was the day the system went down for maintenance.

The dev version was hanging onto the TOC entries because it doesn’t remove entries even if new hourly catalogs drop records. On a restart, you'd see them disappear which is the case now on dev.

tslawecki commented 6 years ago

Geonetwork instance is up and running AFAIK - http://data.glos.us/metadata/srv/eng/main.home?.

From: Bob Fratantonio notifications@github.com Sent: Thursday, January 4, 2018 10:29 PM To: glos/myglos Cc: Subscribed Subject: Re: [glos/myglos] Obs not displaying on portal (#184)

Ok after some investigation it appears that this error has been occurring since 2017-12-21, which was the day the system went down for maintenance.

The dev version was hanging onto the TOC entries because it doesn't remove entries even if new hourly catalogs drop records. On a restart, you'd see them disappear which is the case now on dev.

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/glos/myglos/issues/184#issuecomment-355464293, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AE-3PuV6wlBxDEbtPeEJq7DocoS0LrD0ks5tHZcpgaJpZM4RT0cA.

kkoch commented 6 years ago

Yes GN is running and has been. The only change that has been made was this week to the help.xml and I restarted it a few times to have some xsl changes take affect. But it's been running the whole time and definitely nothing new around the maintenance downtime.

tslawecki commented 6 years ago

Is there an intermediary process that might not have been restarted that sits between GN and the portal? I noticed that VMs named "myglos3" and "web4" are currently turned off, would one of those be hosting a translation or similar service?

Bobfrat commented 6 years ago

I'm not sure about any intermediary services but I am able to successfully query GN for the ISO records but when I ask for any more than 270 records as part of the request the response fails. There must be some error messages in the logs. @tslawecki are you able to take a quick look in the logs. I'm not sure where to look.

kkoch commented 6 years ago

@Bobfrat can you post the query string you are using? I'm also going to be checking the GN logs to see if anything funky is showing up there.

Bobfrat commented 6 years ago

Yeah it's a POST request to http://data.glos.us/metadata/srv/eng/csw with the following body content (application/xml):

full If I set maxRecords to something under 270 it works

kkoch commented 6 years ago

There is a place in GN to test CSW calls (http://data.glos.us/metadata/srv/eng/test.csw) and it sure looks like I can pull up all the records and shows a count of 866 which is the right number.

To troubleshoot, I plugged in your post request above and it didn't work. So I then starting putting in the parameters that differed from yours and and from the sample they provided. I don't know enough about CWS but the issue seems to be:

csw:ElementSetNamefull Pretty sure it should be ... full Although I'm not sure why that would make a difference on max records. So that might just have been a typo when you were creating this ticket and have nothing to do with the actual problem. Otherwise it works and if I use their sample for getting summary results as above it shows the right number of records > ...and a snip of the results are .... ... Given that... we did have one time where there was a bad GN record where on an xml import the namespaces were defined in the child elements and Luke was able to isolate the bad record. But that was from an xml import and I haven't done anything like that in recent times. (see issue #95). Not to mention any updates I've done, I've verified that they pushed thru to production portal after the update in GN so while not impossible, I tend to think that's not the issue here. Plus that was just keeping new records from showing up, not 'everything'.

benjwadams commented 6 years ago

The issue does not appear to be GeoNetwork related (when I checked). It arises when an error occurs within a resource pointed to by the CSW records attempts to fetch. For example, this record which gives a 500 error when trying to issue a getMap request caused problems: http://tds.glos.us/thredds/wms/SM/LakeMichiganSM-Agg?request=getMap . Trying to grab this file through OpenDAP indicates there's some kind of file size truncation issue going on. Regardless, the sane behavior would be to log an error rather than bombing out, so I'm updating some code related to this and hope to have a fix shortly.

kkoch commented 6 years ago

I also looked at the GN logs. Around December 10th we started getting some errors

It also started warning about several metadata records that is says metadata not found or invalid schema. These are very old (pre-me) records and look to be in FGDC not ISO format. Somewhat odd that they all of a sudden started being an issue.

However, I'm pretty sure the portal was working even after those errors so not sure if these are related or not.

2017-12-10 23:29:41,720 ERROR [geonetwork.search] - Errors occurred when trying to parse a filter: 2017-12-10 23:29:41,720 ERROR [geonetwork.search] - ---------------------------------------------- 2017-12-10 23:29:41,720 ERROR [geonetwork.search] - org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 54; cvc-complex-type.2.4.b: The content of element 'ogc:Filter' is not complete. One of '{"http://www.opengis.net/ogc":spatialOps, "http://www.opengis.net/ogc":comparisonOps, "http://www.opengis.net/ogc":logicOps, "http://www.opengis.net/ogc":_Id}' is expected. 2017-12-10 23:29:41,720 ERROR [geonetwork.search] - ---------------------------------------------- 2017-12-10 23:29:47,555 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 216 2017-12-10 23:29:52,738 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 97 2017-12-10 23:29:53,050 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 141 2017-12-10 23:29:53,272 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 168 2017-12-10 23:29:53,538 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 205 2017-12-10 23:29:53,982 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 25 2017-12-10 23:29:54,141 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 38 2017-12-10 23:29:54,150 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 34 2017-12-10 23:29:54,270 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 45 2017-12-10 23:29:54,278 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 51 2017-12-10 23:30:00,323 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 149 2017-12-10 23:30:00,376 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 210 2017-12-10 23:30:08,106 WARN [jeeves.webapp.csw] - SearchController : Metadata not found or invalid schema : 209 2017-12-10 23:30:08,602 INFO [jeeves.service] - -> dispatching to output for : csw 2017-12-10 23:30:08,602 INFO [jeeves.service] - -> writing xml for : csw 2017-12-10 23:30:11,154 INFO [jeeves.service] - -> output ended for : csw 2017-12-10 23:30:11,154 INFO [jeeves.service] - -> dispatch ended for : csw

Also, I checked and the last update I did in GN was 12/22 at 2:56pm. That would have been right before I headed out for the holiday so did not verify if that showed up on production.

Is there are possibility that with the migration of the servers at about that day/time that there is a permission or ownership that got changed? I'm seeing no problems within GN itself (knock on wood) at this point in time.

benjwadams commented 6 years ago

@Bobfrat should have pushed some changes to handle more error cases.

There were also a couple bad files for http://tds.glos.us/thredds/dodsC/SM/LakeMichiganSM-Agg.html that were causing the aggregation and associated data access endpoints to not function properly.

As they were causing the aggregation to become corrupted and not function properly, I moved them out of the thredds data directory to a folder /root/MTRI-SM/michigan You may want to replace them with non-corrupted versions.

These files are for the Lake Michigan suspended minerals are:

4326_201705051900.nc
4326_201705071850.nc
4326_201707311910.nc

kkoch commented 6 years ago

Closing this issue and opening a new one for fixing the data corruption issue.

glos / myglos

Obs not displaying on portal #184