Ouranosinc / Magpie

AuthN/AuthZ services
https://pavics-magpie.readthedocs.io
Apache License 2.0
1 stars 5 forks source link

[BUG] ncWMS2 unaccessible while it is fully open to anonymous group #468

Closed tlvu closed 3 years ago

tlvu commented 3 years ago

Describe the bug ncWMS2 service unaccessible while it is fully open to anon group.

This happens after we resume from a power outage, without any config change. This service has been working fine forever before.

The impact is now the canarie monitoring page (https://pavics.ouranos.ca/canarie/) is not working and we appear down for Canarie.

Error for the canarie monitoring:

[2021-09-24 20:49:01 +0000] [29] [CRITICAL] WORKER TIMEOUT (pid:1548)
[2021-09-24 20:49:01,632] [1548] [ERROR] app_object : Exception occurs while trying to check status of node.ncWMS2-public
[2021-09-24 20:49:01,632] ERROR in monitoring: Exception occurs while trying to check status of node.ncWMS2-public
[2021-09-24 20:49:01 +0000] [1548] [INFO] Worker exiting (pid: 1548)

To Reproduce Steps to reproduce the behavior:

  1. PAVICS https://github.com/bird-house/birdhouse-deploy/tree/1.15.2 (Magpie 3.14.0)
  2. ncWMS2 fully open for anon group

Screenshot 2021-09-24 at 16-58-21 Magpie Screenshot 2021-09-24 at 16-54-41 Magpie

  1. But ncWMS2 is in-accessible:
    
    $ curl --include https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2
    HTTP/1.1 400 Bad Request
    Server: nginx/1.13.6
    Date: Fri, 24 Sep 2021 21:00:29 GMT
    Content-Type: text/xml; charset=UTF-8
    Content-Length: 597
    Connection: keep-alive

<?xml version="1.0" encoding="utf-8"?> <ExceptionReport version="1.0.0" xmlns="http://www.opengis.net/ows/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">

Missing or unknown 'Permission' inferred from OWS 'request' parameter: [None]. Unable to resolve the requested access for service: [ncWMS2]. ```
fmigneault commented 3 years ago

This error is not due to inaccessible. It is indicating that request query parameter is missing. Following works while unauthenticated: https://hirondelle.crim.ca/twitcher/ows/proxy/ncWMS2?request=getcapabilities

tlvu commented 3 years ago

This error is not due to inaccessible. It is indicating that request query parameter is missing. Following works while unauthenticated: https://hirondelle.crim.ca/twitcher/ows/proxy/ncWMS2?request=getcapabilities

Oh this is odd, I can curl the "internal" url fine without any request param. But you got a point here. This might not be a real bug after all.

$ curl --include --silent http://pavics.ouranos.ca:8080/ncWMS2/ | head -20
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Length: 1329
Date: Mon, 27 Sep 2021 02:08:36 GMT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<link rel=StyleSheet href="css/ncWMS.css" type="text/css" />
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title>Birdhouse ncWMS2 Server</title>
    </head>
    <body>

    <h1>Birdhouse ncWMS2 Server</h1>
    <h3>Running ncWMS v2.0.4</h3>
fmigneault commented 3 years ago

@tlvu It is partly due to Magpie. It is specified in the service definition that request is an "expected" parameter needed in order to determine which permission applies for the request. Since it is not provided, Magpie is not able to figure out if you should be allowed access or not. The service could be updated to use GetCapabilities by default if request is omitted, since it seems to allow it.

tlvu commented 3 years ago

Error for the canarie monitoring:

[2021-09-24 20:49:01 +0000] [29] [CRITICAL] WORKER TIMEOUT (pid:1548)
[2021-09-24 20:49:01,632] [1548] [ERROR] app_object : Exception occurs while trying to check status of node.ncWMS2-public
[2021-09-24 20:49:01,632] ERROR in monitoring: Exception occurs while trying to check status of node.ncWMS2-public
[2021-09-24 20:49:01 +0000] [1548] [INFO] Worker exiting (pid: 1548)

The Canarie monitoring failure of ncWMS2-public above that prompted this issue was finally due to a dying DNS server after our power outage. We removed that bad DNS server from the list and everything seems to be back to normal.

Sorry for the false positive.