Open forman opened 6 years ago
Added label "external" because resolution requires new ODP web service.
Here is some example JSON, that could be returned by the health care service:
{
"services": {
"CSW": {
"status": "OK"
},
"WCS": {
"status": "OK"
},
"ESGF": {
"status": "OK"
},
"OPENDAP": {
"status": "SLOW",
"reason": "..."
},
"HTTP": {
"status": "OK"
},
"FTP": {
"status": "DOWN",
"reason": "..."
}
},
"anouncements": [
{
"published": "2018-12-06T10:20:13",
"status": "DOWNTIME",
"services": ["CSW"],
"period": ["2019-01-01", "2019-01-03"],
"title": "Catalogue Service Downtime",
"description": "The ODP CSW will be down from 2019-01-01 to 2019-01-03 for maintenance reasons."
},
{
"published": "2018-11-23T14:06:31",
"period": ["2019-02-10", "2019-02-12"],
"services": ["OPENDAP", "CSW", "WCS", "ESGF"],
"status": "LOWBANDWIDTH",
"title": "Service Migration",
"description": "All ODP services will be moved to new infrastructure. From 2019-01-01 to 2019-01-03 you may observe low bandwidth."
}
]
}
Is the services section meant to be populated as a result of polling the origin servers? If so, then:
Our aim is to use some RESTful meta-service API that we can use from the CCI Toolbox. Again, we don't care about how this will be implemented on the server side. Timeouts on the clients may have various reasons - we want to know what the status on the server side.
For example we just received a mail from Alison saying
Just to let you know that there was an issue with the ESGF update that we deployed yesterday, and to fix it, the OPeNDAP (and other ESGF access e.g. HTTP, WMS) will need to be taken offline this afternoon. I’ll let you know as soon as it’s back up and running, but it may be down all afternoon unfortunately. The portal front end and anonymous ftp download should be unaffected.
This is the stuff that we would like to pass over to our users in advance.
I still don't understand how you expect the services section to be updated? If you want to know the status on the server side it suggests a manual update, which as I've mentioned before won't be workable for unscheduled outages, or some integration on-site with the opendap servers.
I still don't understand how you expect the services section to be updated?
I don't know. I expect, some experts will find a solution.
E.g. using https://www.nagios.org/
Just to chime in this discussion a bit. Here are a few examples of how widely used and known services convey status information to their users:
http://status.gandi.net/timeline https://status.twitterstat.us/# https://status.status.io/
How exactly the status of a particular system of a particular service is determined and updated is of course specific to each system. From the users' perspective, however, a trusted, machine readable channel is provided.
Thanks @JanisGailis !
Interesting. The gandi.net one illustrates a couple of points that I'm trying to make above.
I'm going to address this now by separating network errors from others, so the GUI can show a different error dialog.
Now showing the following error dialogs:
Expected behavior
Cate Desktop should "know" if the CCI ODP service is available and should clearly display to to users its health status.
Actual behavior
Users receive error messages when downloading and accessing data (usually connection time-out errors). To users it appears as if Cate was not working correctly.
Steps to reproduce the problem
Download or access ODP data sources, when ODP services are down.
Specifications
Cate 1.0 - 2.0.dev20