b-cube / semantics-preprocessing

initial text preprocessors for the triplestore and feature classification
Other
2 stars 3 forks source link

Finish identifier class #24

Closed roomthily closed 9 years ago

roomthily commented 9 years ago

Merging some issues into one comprehensible thing.

Remaining tasks:

There is some known wonkiness in the yaml configuration (list v dict) so that should also be added to the list.

Regarding the CSW situation (and this applies also to oai-pmh at a minimum), we may need to switch from a binary check based on ordering of the services in the config to some scoring function - 67% likely to be Service A or something. I think we're good for now as long as we're careful about the filters and the secondary "is dataset?" or "is metadata?" can help mitigate this problem.

roomthily commented 9 years ago

On Version Identification

In some cases, the service response will have an explicit version element (root.attrib['version'] for many OGC responses, for example). Sometimes we don't have that or we have an implied version (OpenSearch namespace URI or THREDDS catalog URI). So we have two definitions - some default value for the URI-type situations (if match, version='1.1') and a pull from the response.

  versions:
      defaults:
        ors:
          - type: simple
            object: content
            value: 'http://a9.com/-/spec/opensearch/1.1/'
            text: '1.1'

    versions: 
      checks:
        ors:
          - type: xpath
            # fully qualified xpath which is lovely and short here
            value: '@version'

Related commits: https://github.com/b-cube/semantics-preprocessing/commit/e48dc058bb3e52ef841dcf3c56cfa0af57bc7cbf

https://github.com/b-cube/semantics-preprocessing/commit/f89d8c44f0b1b8f78e72986f40ede874990129b1

roomthily commented 9 years ago

On URNs

Based on the OSGeo/geopython/ESIP Discovery work.

consider

Of course, this all assumes that you have some service response that is well-contained, which we do not. WxS GetCapabilities describe enough to get at the data service so could be classified as both service and dataset (specific to this project, mind).

roomthily commented 9 years ago

For the OpenSearch vs. ATOM with OS elements:

<feed xmlns='http://www.w3.org/2005/Atom' 
    xmlns:georss='http://www.georss.org/georss' 
    xmlns:opensearch='http://a9.com/-/spec/opensearch/1.1/'>

where we're just pinging for the namespace URI without understanding its relationship in the XML, ie it is not the default namespace and there are other namespaces present.

roomthily commented 9 years ago

It is catching OpenSearch errors :+1:

roomthily commented 9 years ago

High priority services are running reasonably. Everything else will be a discrete bug. Cheers.