codeforamerica / ohana-api

The open source API directory of community social services.
http://ohana-api-demo.herokuapp.com/api
BSD 3-Clause "New" or "Revised" License
185 stars 342 forks source link

Consider looking into HAL as a standard way to include multiple taxonomies in one API #349

Open ckoppelman opened 9 years ago

ckoppelman commented 9 years ago

While perusing the Open Supporter Data Interface standard, I learned how they deal with a federated data standard - that is, how they deal with different taxonomies in one API.

They use a standard called HAL (Hypertext Application Language or the more formal IEFT Standard Draft). Basically, it's a way to "curie" different API standards into one JSON or XML document.

We've discussed how to deal with different service and eligibility taxonomies in the past, and this may be a viable solution (at least on a JSON and XML level).

What do you all think?

greggish commented 9 years ago

Thanks for sharing, Charles. We might want to edit the title to more directly describe the use case... would it be something like: "Consider using HAL to link different taxonomies through one API" ?

monfresh commented 9 years ago

Could you please explain in more detail, and preferably with some code examples, what the problem is (with a specific use case), and how HAL can solve that problem?

The way I understand it, HAL's purpose is to make it easier for clients that consume an API (such as Ohana Web Search) to discover the available resources and how to interact with them. From the stateless.co site:

HAL is designed for building APIs in which clients navigate around the resources by following links.

Links are identified by link relations. Link relations are the lifeblood of a hypermedia API: they are how you tell client developers about what resources are available and how they can be interacted with, and they are how the code they write will select which link to traverse.

Link relations are not just an identifying string in HAL, though. They are actually URLs, which developers can follow in order to read the documentation for a given link. This is what is known as "discoverability". The idea is that a developer can enter into your API, read through documentation for the available links, and then follow-their-nose through the API.

Ohana API already has some hypermedia support via link relations, implemented following GitHub's API patterns of using Link Headers for pagination, and RFC 6570 URI templates at the root endpoint. We also have a Ruby client that makes it easy for client developers to interact with the API and follow those links.

ckoppelman commented 9 years ago

The specific issue is declaring different eligibility and service taxonomies. If going forward, we allow the use of both LA 2-1-1 and Open Eligibility taxonomies, this is a way to declare (on a machine level) which taxonomies are being used. The particularly interesting part of the spec in this regard is curies. If the HAL spec doesn't help (it kinda doesn't), check out OSDI's explanation.
Curies also provide well-defined methods of extension for vendors and system developers. As it says in OSDI:

Vendors who add their own vendor-specific relationships must defined their own curie and preface their relationships with their own curie namespace. For example,

"_links": {
  "curies": [
      { "name": "osdi", "href": "http://api.opensupporter.org/docs/v1/{rel}", "templated": true },
      { "name": "fb", "href": "http://facebook.com/docs/v1/{rel}", "templated": true }
  ],
  "self": {
      "href": "http://api.opensupporter.org/api/v1/question_answers/46"
  },
  "osdi:question": {
      "href": "http://api.opensupporter.org/api/v1/questions"
  },
  "fb:profile": {
      href: "http://facebook.com/profiles/1234"
  }
}
monfresh commented 9 years ago

Thanks for the additional details. Currently, Ohana API only supports one taxonomy at a time. If your instance happened to contain data from multiple sources, each with its own taxonomy, you would need to merge everything into your own custom taxonomy. The current database table that houses the taxonomy terms has a uniqueness constraint on the category name, so if you had multiple categories called "Emergency", you would not be able to include them all at once while specifying that this one belongs to Open Eligibility and another belongs to some other taxonomy.

Being able to use more than one taxonomy at a time would require more than just an API change, and also implies a few things:

As far as I know, none of the above hold true at the moment, so until there is an actual need for this, and if the need is widespread enough to justify including it in this generic repo, I wouldn't put any effort into this.

Charles, could you provide a specific example of when a particular instance of Ohana API would require using more than one taxonomy at a time? Similarly, could you provide an example of a client application that would require knowing which taxonomy is being used?

ckoppelman commented 9 years ago

The only system I can understand where the taxonomy is not an issue is a closed system. And it's still important there, it's just not a question. As soon as you start using an API to communicate between systems, you need to know what taxonomy you are using.

I agree that you can do this without using HAL. It's a suggestion of a way to solve this problem that already has some tooling built on top of it and some Internet standards being developed. Standards are much more robust when built on top of other standards.

Two use cases:

  1. Imagine you are writing an application that helps help-seekers or case managers search for help in nearby cities. Now imagine a number of different 2-1-1's that host an OpenReferral/Ohana API. These 2-1-1 systems use several different taxonomies across several geographies. In writing this application, you'd like to provide the user with the ability to search on the type of service or on the eligibility criteria. By requiring what is effectively a namespace in the service and eligibility taxonomies, you can do many things programmatically that would otherwise have to be hardcoded (including defining mappings from one taxonomy to another).
  2. Imagine you run a very specific type of referral agency. I'll use Polaris, the anti-trafficking organization, as an example, since I know the domain. You would like to use 2-1-1 data and supplement it. Some of the fields you care about are not captured in 2-1-1 data. You work with other anti-trafficking organizations that would benefit from these fields, but since they are not defined in the OpenReferral spec, there is no formal definition of these fields. If you add these fields to your data, the fields' names may collide with other OpenReferral APIs' names. Instead, you namespace the fields and include discoverable documentation about the fields. This way, as the data creeps its way throughout the network, it is exportable, understandable and ignorable. The fields are no longer adding confusion, but adding data.
monfresh commented 9 years ago

If I understand the first use case correctly, the problem is when a client app needs to interact with more than one Ohana API, and when each API has its own single taxonomy. Is that correct? If so, then the problem is not about how one single instance of Ohana API can itself use multiple taxonomies, right?

If the former is correct, then it's only an API change. Feel free to submit a pull request.

For some background, this repo is currently only maintained by the community on a volunteer basis, so I personally prioritize the issues I work on based on the need. Bugs get the highest priority, and feature requests are based on popularity and actual usage.

Based on your comments in this issue, it sounds like the examples are still theoretical, right? Or are there really 2-1-1 entities with live Ohana API deployments, along with a client app that is consuming those multiple APIs?

monfresh commented 9 years ago

One more thing I forgot to ask. To make sure this feature is well documented, could you specify exactly how you'd like the API to expose the taxonomy information, how the client app will be consuming this information, and for what purpose.

An example would be something like this:

I'm building a website where help seekers can search for services, and on the details page of a particular location, I'd like to display the types of services (aka categories in Ohana API parlance) so that users can click on them and be able to find other locations that provide those same types of services. The API is already giving me the names of the categories, including their relationship to other categories, which allows me to easily display them in hierarchical format, as shown on the right in this screenshot:

ohana web search categories

Ohana API also already allows me to perform a service type search using the category parameter. However, Ohana API is not returning the name of the taxonomy that the categories are based on (such as "Open Eligibility"), and I need to have that data so that the user can ___.

The more detail about a feature request, the better chance it has of being implemented, and I find that this kind of detail can only come from actual interaction with the API by building something. You might find that the API already provides everything you need, or it might not, but then you'll be in a better position to provide very specific feedback.

My specific question is how does adding a taxonomy namespace allow you to build a better search for help seekers (in your first use case above)? Help seekers don't care whether a particular term comes from the Open Eligibility taxonomy or some other taxonomy, as long as the term is understandable, right? The Ohana API category search is also taxonomy-agnostic. It finds results that have been tagged with a particular term. It doesn't matter where the term came from.

greggish commented 9 years ago

Assuming I’ve followed this correctly, I’ll reaffirm (a modified version of) @monfresh's scenario: yes, we should assume that different APIs (Ohana or otherwise) will operate with different taxonomies. And I'd also reaffirm @ckoppelman's point that both people and machines would benefit from ways to navigate through such variability.

monfresh’s point about prioritization (especially w/r/t his own time) is well-taken; this issue should be taken up when there's a specific, actionable need/opportunity to test such a feature.

That said, in the meantime, it’s also worth checking (again) a couple of the assumptions that seem to have been at play in Ohana’s original conception. For one, we shouldn’t assume that resource directory data can or should be produced within a closed system. Following from that: while we know that a help-seeker will certainly not care about what taxonomy codes are assigned to which services, we also should recognize that this issue nevertheless affects the the quality and usability of resource data in various ways -- such as provenance; variability of profiles among subdomains of services; relevance that might be ascertained more by logical conditions than a service's raw-text description; and even deliverability (as systems might use taxonomy codes to determine which fields are presented to users and how). Taxonomy codes may be irrelevant to help-seekers, but they're quite relevant to database administrators who maintain the information that help-seekers seek, service providers who help help-seekers, and even researchers and analysts (whose needs shouldn't take priority over help-seekers and service providers, but also shouldn't be forgotten about). This is one of the major technical challenges we need to address on the path to success.