Refactor BiG-CZ Back-end

rajadain commented 7 years ago

Currently, the structure of the back-end is as such:

                                  +------------+
                            +---> | cinergi    | +---+
                            |     +------------+     |
           +----------+     |                        |      +------------+
 request   | generic  |     |     +------------+     |      | common     |  response
+--------> | search   | +-------> | hydroshare | +--------> | serializer | +--------->
           | endpoint |     |     +------------+     |      |            |
           +----------+     |                        |      +------------+
                            |     +------------+     |
                            +---> | cuahsi     | +---+
                                  +------------+

Which was based on the premise that unifying these APIs would help make them more accessible and easier to navigate. However, each service has its own quirks and perks, most of which are lost in this unification. Thus, we need to switch to a different pattern, probably like this:

           +------------+------------+------------+
 request   | cinergi    | cinergi    | cinergi    |  response
+--------> | endpoint   | processor  | serializer | +--------->
           +------------+------------+------------+

           +------------+------------+------------+
 request   | hydroshare | hydroshare | hydroshare |  response
+--------> | endpoint   | processor  | serializer | +--------->
           +------------+------------+------------+

           +------------+------------+------------+
 request   | cuahsi     | cuahsi     | cuahsi     |  response
+--------> | endpoint   | processor  | serializer | +--------->
           +------------+------------+------------+

Where the endpoints behave similarly, but the outputs can be quite different.

Instead of the search request being like:

GET /api/bigcz/search?catalog=hydroshare&from=...

we should do:

GET /api/bigcz/hydroshare/search?from=...

rajadain commented 7 years ago

Regarding @mmcfarland's comments that users may want to search without needing to specify the catalog, this is currently not supported. However, as an optional filter, it should be a query parameter rather than a path parameter.

The alternative to keep things uniform is to have a base format that is the same, and extend it with custom fields / serializers for each catalog. Thus, I propose the following:

API Endpoint

Currently, a sample query may be made at the following endpoint:

https://staging.portal.bigcz.org/api/bigcz/search?catalog=cuahsi&query=water&bbox=-75.2585830259156,39.876054698521,-75.159759422944,40.0310812816138

and the output is like this:

{
  "api_url": null,
  "catalog": "cuahsi",
  "count": 344,
  "results": [
    {
      "author": null,
      "created_at": "2015-06-05T00:00:00Z",
      "description": "The USGS National Water Information System (NWIS) provides access to millions of sites measuring streamflow, groundwater levels, and water quality. This web service provides methods for retrieving Ground Water data from NWIS. For more information about NWIS, see the NWIS home page at http://waterdata.usgs.gov/nwis",
      "geom": {
        "coordinates": [
          -75.2329444,
          39.95130556
        ],
        "type": "Point"
      },
      "id": "NWISGW:395705075135901",
      "links": [
        {
          "href": "http://hiscentral.cuahsi.org/pub_network.aspx?n=8",
          "type": "service"
        }
      ],
      "title": "PH  1061",
      "updated_at": null
    },
    ...
  ]
}

We will keep the request path same as it is. The new result format would be like this:

[
  {
    "api_url": null,
    "catalog": "cuahsi",
    "count": 344,
    "results": [
      {
        "author": null,
        "created_at": "2015-06-05T00:00:00Z",
        "description": "The USGS National Water Information System (NWIS) provides access to millions of sites measuring streamflow, groundwater levels, and water quality. This web service provides methods for retrieving Ground Water data from NWIS. For more information about NWIS, see the NWIS home page at http://waterdata.usgs.gov/nwis",
        "geom": {
          "coordinates": [
            -75.2329444,
            39.95130556
          ],
          "type": "Point"
        },
        "id": "NWISGW:395705075135901",
        "links": [
          {
            "href": "http://hiscentral.cuahsi.org/pub_network.aspx?n=8",
            "type": "service"
          }
        ],
        "title": "PH  1061",
        "updated_at": null,
        "customField1": "some value",
        "customField2": "another value",
        ...
      },
      ...
    ]
  }
]

The differences are:

The new fields will be merged into the result type after the base fields
The output of the /search endpoint will be an array, each element of which corresponds to a resultset from a catalog. When in the future we move to supporting multiple catalogs, the output may look like:

[
  {
    "api_url": "http://132.249.238.169:8080/geoportal/opensearch?q=water&bbox=-75.2585830259156%2C39.876054698521%2C-75.159759422944%2C40.0310812816138&f=json",
    "catalog": "cinergi",
    "count": 10,
    "results": [ ... ]
  },
  {
    "api_url": null,
    "catalog": "cuahsi",
    "count": 344,
    "results": [ ... ]
  },
]

The results of each catalog may have certain different fields, but will each have a common subset that will always be present.

Implementation

In order to support different fields for each catalog, the common ResourceSerializer will be extended by each catalog to add specific fields. ResourceListSerializer will be parameterized to be given an implementation of ResourceSerializer on instantiation, so it can support different result sets per catalog.

In the front-end, the Result will have to be similarly extended with the expected custom fields of each catalog.

Considerations

Given that in the future we may be paginating through all results before returning them to the client (see #1850), this could result in a fair bit of server side processing for each API call. Depending on how often the underlying data is updated, we should look in to caching the responses, especially for WKAoIs, as done for the geoprocessing results elsewhere (see #1892).

mmcfarland commented 7 years ago

I didn't realize that, in effect, there was no unified endpoint since the querystring is separating the catalog queries. Since we've already backed away from the "single -> search all" direction, your original might still be fine. Having a common + extended set of values for each type in your update seems good, too.

WikiWatershed / model-my-watershed