codeforamerica / ohana-api

The open source API directory of community social services.
http://ohana-api-demo.herokuapp.com/api
BSD 3-Clause "New" or "Revised" License
185 stars 342 forks source link

All service areas from all services at location inherited when using 'keyword' search. #365

Open cderenburger opened 9 years ago

cderenburger commented 9 years ago

When performing a keyword= search and setting a service_area= parameter, the results page includes all service areas for all services at a location, rather than the service area for the matching service. This behavior is not present when setting a category= parameter.

This issue is perhaps better presented with some examples.

When performing a search for "Food Pantries" the following results are given. https://win211-web-search.herokuapp.com/locations?utf8=%E2%9C%93&keyword=Food+Pantries&service_area=Wahkiakum&location=&org_name=

keyword_search

If doing the same search but changing keyword= for category[]= the results are a correct match between the category[]= and service_area=

https://win211-web-search.herokuapp.com/locations?utf8=%E2%9C%93&category[]=Food+Pantries&service_area=Wahkiakum&location=&org_name=

category_search

"Puget Sound Labor Agency", "Lifelong", and "Northwest Harvest" all include at least one service that includes the taxonomy term "Food Pantries", but that service does not contain the service area currently selected in the example. These locations include other services which contain the service area, but not the taxonomy term.

https://win211-web-search.herokuapp.com/locations/puget-sound-labor-agency?keyword=Food+Pantries&location=&org_name=&service_area=Wahkiakum&utf8=%E2%9C%93

details

The Puget Sound Labor Agency is highlighted here as the simpler service for an example with one program the "Food Pantry" containing the taxonomy term "Food Pantries" in the service area "King" County. This service is listed in the search results for "Food Pantries" all counties/service_areas in the state because the other program "Wheelchair Ramps" contains all 39 counties in the service area.

Is there any way to have the keyword= searches return the correct services areas like category= searches?

monfresh commented 9 years ago

@cderenburger That is working as designed. The keyword search performs a full-text search that looks for the terms in various places, like the location name and description, the organization name, the services name, description, keywords, and categories. This is explained in the search documentation.

To achieve your desired results in this scenario, the category parameter is the correct one to use since you're looking for a particular type of service as opposed to asking for anything that might include this term. By using the category parameter, you are essentially limiting the search to service attributes.

Another way to do this would be to add a new "service_keyword" parameter to your API that only searches within services fields.

This could also be due to the way your data is organized. According to the PSLA's website, their "Food Bank is located in the historic Labor Temple in Belltown". This might be better organized as a separate Location under the PSLA Organization, in which case it won't show up in the keyword search results.

cderenburger commented 9 years ago

On the excel workbook I use to convert the info I need I've been matching services to locations, perhaps I need to flip that and instead match locations to services. This would create a separate location (even if it contains the same info) for every service. May be worth a try to see if it produces the desired results.

On the ohana-web-search I'd like to be able to keep the user input to a single search field, which is great because our own DB provider unfortunately requires different input fields for different types of searches. If I understand correctly if the keyword search finds a match in a category, it returns those matches (ranked higher) I think in the search results while also performing a full text search. Is it possible to have the keyword not perform a full text search, or rather return only category results if the string it is passed is an -exact- match for a category term?

For example a place I have some odd results is a keyword search for "Furniture". I have a category term "BM-3000.2000 - Furniture", but in my keyword search results I also receive matches for "TE-7900 - Street Furniture" (road signs, street lights) because "furniture" is in the taxonomy term, but not an exact match.

Is it possible for the keyword search to do a category search if the string it is supplied is an -exact- match for a category, else do a full text search?

md5 commented 9 years ago

This sounds to me like more evidence that Ohana should be providing a service-centric search functionality, not a location-centric one.

Having to create duplicate locations to make things work the way users expect seems wrong to me.

monfresh commented 9 years ago

Is it possible to have the keyword not perform a full text search, or rather return only category results if the string it is passed is an -exact- match for a category term?

It's probably possible but it wouldn't make sense because it would mean performing two queries every time. First, it would perform a query using the category parameter, and return those results if it found any. Otherwise, it would do a full-text search query.

If your goal is to find an exact match for a category term, then you should use the category parameter, not the keyword parameter. The API provides various different filters that you can use for different purposes.

If you want the keyword field to only search inside service attributes, then you'll have to customize your API to do that. The reason why the keyword field does a full-text search in various places is to maximize the changes of getting results because the quality of the data will vary from once place to another.

The quality of your results will depend on the quality of your data. It's very hard for the API to provide a solution that fits everyone's needs out of the box. If you are confident that your services data is rich enough that a full-text search that is limited to service attributes will return relevant results, then you can modify the keyword search to only look there.

The behavior of the keyword search field will also depend on who will be using the ohana-web-search site. If it's residents in need, they're not going to know the exact taxonomy terms and will not search for them via a text box. They'll need a menu or links to choose from to help narrow their search. If the site will most often be used by social workers, then a search field that performs a category search might make sense.

For example a place I have some odd results is a keyword search for "Furniture". I have a category term "BM-3000.2000 - Furniture", but in my keyword search results I also receive matches for "TE-7900 - Street Furniture" (road signs, street lights) because "furniture" is in the taxonomy term.

This is expected as explained earlier. To get the desired results here, you would either need to change the name of the "Street Furniture" category to something that makes sense to the general population, or use the category parameter. Using a single search field for all scenarios is going to be very hard and not user friendly if the user is a resident in need. Providing filters to narrow down search results is very helpful.

@md5 You should never have to create duplicate locations. I think what is going on here is mostly due to expecting search to work in a way that it was not intended for out of the box, and also due to the data itself. All of the issues brought up here can be solved by either using the category search or modifying the keyword search to only look in services attributes.

My point about the Food Pantry location was that the service is provided in a specific physical location that is different from the location of the Organization's headquarters. Most services are provided at a specific physical location. When someone looks up a service, they will want to know where they can go to receive that service. If an organization provides 2 different services, and each service is provided in a different physical location, then the data should be organized such that Organization A has 2 Locations B and C, with Services D and E, as opposed to 1 Location B with Services D and E, where Service E is actually not offered at Location B.

cderenburger commented 9 years ago

If an organization provides 2 different services, and each service is provided in a different physical location, then the data should be organized such that Organization A has 2 Locations B and C, with Services D and E, as opposed to 1 Location B with Services D and E, where Service E is actually not offered at Location B.

There are many instances where there are multiple types of services, offered out of a single location. For example the PSLA's headquarters is their location for food distribution, but also the statewide wheelchair ramp program. The food bank in this case is in the same location, not at a separate location.

There are locations as well that may house many different programs at a single location such as 2-1-1s, food banks, child care, mental health, housing case management, and more under a single organization under a single roof. I've heard it described as "service siloing". Usually these services have the same service_areas at a location, sometimes there are different service areas among the different services.

Our data is structured so that in an organization there may be many locations and many services. Each location may have multiple services, and each service may have multiple locations. Below is an example to help visualize and provide context to the record organization we use.

untitled

Each service is connected to a corresponding location by a link record. This link record contains the service_area, phones, hours, and filters.

It's very hard for the API to provide a solution that fits everyone's needs out of the box.

I don't doubt it, I was delighted to see that it supports taxonomy (even custom ones) out of the box.

I do really like the full text search, it is powerful and much easier for the public to use and navigate than our current search provider. Full text allows users to search "Dental Care with Medicaid" and get relevant results without having to apply search filters. This is also more in line with how users expect to search. In many ways it is already better than what we have had for years.

Perhaps where I need to focus is on the ohana-web-search, maybe putting together an autocomplete that will provide taxonomy term suggestions based on the user input. That or add a service_area drop down to the main search page at https://win211-web-search.herokuapp.com and have the categories links do category[] searches rather than keyword searches. I'll play around with it, or I'll stick with what I have as I think users will be fine with the results as-is.

Thanks!

monfresh commented 9 years ago

Sounds good. Taxonomy-based search is something we've wanted to include in ohana-web-search but haven't figured out the best way to do it. One thing you could try is to enable the category links on the location details page by removing line 91, indenting line 92 back one stop under the span, and removing line 93. This would allow visitors to perform a category-based search by clicking those taxonomy terms. The only issue is that there currently is not a corresponding UI on the search results page to signify that a category filter has been applied.

cderenburger commented 9 years ago

I think I'm finally understanding more of how the text searches are preformed. Please let me know if I understand this correctly.

  1. A GET request is sent with keyword=foo search is performed with (foo), which is passed along to search.rb.
  2. On search.rb line 56 def search(params = {}) does a text_search unless the params sent include both a keyword and a service area.
  3. On line 29 search.rb def keyword(query) looks for (foo) within the tsv_body created by [add_search_vector_to_locations.rb](created by ohana-api/db/migrate/20140505011725_add_search_vector_to_locations.rb).
  4. Line 32 on [add_search_vector_to_locations.rb](created by ohana-api/db/migrate/20140505011725_add_search_vector_to_locations.rb) created a copy of the text of all the taxonomy terms, which is put into the tsv_body for the location.
  5. (foo) is found in the tsv_body as "Foo Assistance" in several locations vectors, these locations are returned from keyword(query) to search(params - {}) which are returned as res.

Am I following along? If so I now understand how and why I get the results in the original post.

What I am hung up on: A. Line 32 on [add_search_vector_to_locations.rb](created by ohana-api/db/migrate/20140505011725_add_search_vector_to_locations.rb) (service_categories.name," gets "service_categories" from Line 20. The previous lines ahead on Line 20 state what fields to get from which tables, with a matching location_id. How does Line 20 string_agg(categories.name, ' ') as name into service_categories from locations know what categories are on the location?

B. In def text_search(params = {}) in search.rb what is "relation", I don't know what it does. Is it combining all the values from the scopes?

C. On search.rb def search Line 63 res.select("locations.*, #{rank_for(params[:keyword])}") what is "locations.*"? Is this short-hand for something?

monfresh commented 9 years ago

On search.rb line 56 def search(params = {}) does a text_search unless the params sent include both a keyword and a service area.

Not quite. When the search method is called, it performs all of the chained queries and assigns the result to the res variable (short for "result"). Once we have the result, we return it as is if the params sent don't include both keyword and service_area. Otherwise, we have to add a SQL SELECT query that selects all the columns from the Locations table as well as the full-text search rank. locations.* is SQL syntax for "all columns in the locations table". This additional SELECT is required because the service_area query uses uniq, and when you use uniq (DISTINCT in SQL) in conjunction with the full-text search rank, you have to select the rank. Otherwise, you get a SQL error.

On line 29 search.rb def keyword(query) looks for (foo) within the tsv_body created by add_search_vector_to_locations.rb.

Yes, but add_search_vector_to_locations.rb represents one of the many DB migrations that happened along the way. This particular migration was updated later. To see the latest state of the DB, you'll need to look in https://github.com/codeforamerica/ohana-api/blob/master/db/structure.sql. You can read about migrations in the Rails guides.

Line 32 on add_search_vector_to_locations.rb created a copy of the text of all the taxonomy terms, which is put into the tsv_body for the location.

Yes, but it's not a straight copy of all the text as is. It's more complicated than that. It stores portions of words, which allows it to match the plural version of a word when the query contains the singular version, and vice versa, to name one of the features. You can read about full-text search here:

http://altoros.github.io/2013/implementing-and-improving-postgresql-fulltext-search/ http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/ http://shisaa.jp/postset/postgresql-full-text-search-part-1.html http://railscasts.com/episodes/343-full-text-search-in-postgresql?view=asciicast http://linuxgazette.net/164/sephton.html

How does Line 20 string_agg(categories.name, ' ') as name into service_categories from locations know what categories are on the location?

Categories are on a service, not location. The SQL joins on lines 21-23 allow you to fetch the category names. If you're not familiar with SQL, I encourage you to read up on it, or try these courses (the first one is free):

https://www.codeschool.com/courses/try-sql https://www.codeschool.com/courses/the-sequel-to-sql

In def text_search(params = {}) in search.rb what is "relation", I don't know what it does. Is it combining all the values from the scopes?

relation refers to self that was passed to reduce, which in this context is the Location model. This allows you to combine various search methods, as opposed to having to keep chaining methods in the search method, which would look like this mess:

res =  keyword(params[:keyword]).
       category(params[:category]).
       language(params[:language]).
       org_name(params[:org_name]).
       service_area(params[:service_area]).
       status(params[:status]).
       with_email(params[:email]).
       is_near(params[:location], params[:lat_lng], params[:radius])

You can read about Ruby's reduce method here: http://ruby-doc.org/core-2.2.0/Enumerable.html#method-i-reduce

For more info about how querying works in Rails, read this: http://guides.rubyonrails.org/active_record_querying.html

cderenburger commented 9 years ago

Thanks again for all your help @monfresh. I may try to build up the service_keyword search as an exercise as I continue to teach myself ruby.

Taxonomy-based search is something we've wanted to include in ohana-web-search but haven't figured out the best way to do it.

As noted in another issue https://github.com/codeforamerica/ohana-web-search/issues/486 reconciling keyword and category searches is not an easy task. I've seen a number of database systems and they either use the keyword search to narrow a user down to a selection of taxonomy terms, or some databases are moving towards full text search. Its a mixed bag. I can't think of a system where both have been implemented through a single keyword search box without either dismissing categories or forcing the user to choose a category.

The current trend seems to be moving to offering both guided search and keyword search. You can see a document from AIRS Good Practices for Online Resource Databases which suggests offering both search options. The category guided search being 'exclusive' searches and keyword searches being more 'inclusive'.

What I see mentioned in the document is the addition of 'Use References' to the keyword search. I'm not aware if OE has use references. Essentially they are synonyms for a search topics or phrases. For example for the taxonomy term "Emergency Shelter" use references include "Emergency Housing", "Temporary Accommodation", "Temporary Shelter". Is this something that might be if interest to the project?

monfresh commented 9 years ago

You're welcome. I was partially wrong in my original assessment of the keyword search. Changing the current keyword search to only look in the Services table will not solve the problem because the tsvector column is on the Locations table. It will still return Locations where it finds a match for either the keyword or the service area. I'm not a SQL expert, so I don't know off hand what the solution is, if there is one. However, whether or not this is worth pursuing would depend on the percentage of Locations that provide multiple services, and have different service areas for each service.

As for the use references, I don't think Open Eligibility has them, but they have Human Situations. It's not quite the same thing. However, for mapping taxonomy terms to similar search phrases, or even common misspellings, Ohana provides the "keywords" field in the Services table. It's a freeform field where you can add as many keywords as you see fit. This field is included in the full-text search, so let's say you look at the Google Analytics for your site, and you notice that a lot of people are searching for "help pay for housing", but because that combination of words doesn't exist in your data, they don't get any results. To fix that, you can add that phrase as a keyword on all Services that fit that description.

To do that programmatically, you can do something like this from the Rails console (rails c):

services = Service.joins(:categories).where(categories: { name: 'Emergency Shelter' })

services.each do |service|
  new_keywords = service.keywords.push('Emergency Housing')
  service.update!(keywords: new_keywords)
end
cderenburger commented 9 years ago

Changing the current keyword search to only look in the Services table will not solve the problem because the tsvector column is on the Locations table.

I presumed that a tsvector table would have to be created for the services table for a service based search to work.

At a basic level doesn't Ohana-api model the information as 'Locations have many Services'? Wouldn't switching that model around to 'Services have many Locations' accomplish this? I would think this would require a significant re-write and a different endpoint. Not that I'm asking to have this done, just thinking it through. I haven't looked through the code that helps determine what location/services are nearby or in the search area, or if in fact that the api is modeled this way by Locations to allow for location based searching.

greggish commented 9 years ago

FYI @cderenburger we have semi-regular video-chats among other people working with Ohana and compliant systems -- here's a relevant one for example. We're currently scheduling our next 'Assembly' for sometime in the upcoming weeks. These calls don't get technical unless we deliberately set aside time for that (which is possible, if you'd like) but it might be a good opportunity one way or another to discuss some of these considerations with others.

Here's the schedule survey in which we're picking a time: https://doodle.com/poll/dxqv79kxiabrsua7

Meanwhile, I have some other materials that you might be interested in. What's the best way to reach you?

cderenburger commented 9 years ago

Hi @greggish, sorry I've not been very responsive. I would like to get some time to chat with you sometime soon, I'll send you an email from work tomorrow.

Thanks for the video, great discussion going on in the group. I would be happy to join a group chat if I can arrange to be available for a meeting.

cderenburger commented 9 years ago

@monfresh and @md5

There has been discussion over searching Locations vs Services. Perhaps there is another option that may be of interest.

On the database that I use the relationships between Locations and Services is not direct, not many-to-many. My database has a meta record called a Link record that connects each Service to the Location the service is offered.

Location --><-- Link --><-- Service

The Link record has a 1-to-1 relationship between the Link and Location, and 1-to-1 relationship between the Link and Service. In the case of our current db provider this Link record also stores information specific to the Service at the Location (or Location with Service) including phones, contact person, hours of service, area served, and features (filters).

The Link record could connect all the data from both records that is desired or needed for the search. Instead of searching the Locations or Services, this would search the connection between the two.

cderenburger commented 9 years ago

It will still return Locations where it finds a match for either the keyword or the service area.

The services.csv file contains both a service_id and location_id. Perhaps these two columns could create a separate 'relation' table and the full-text search done from this table instead.

monfresh commented 9 years ago

I'm open to other solutions, but changes to the schema would have to be compatible with the Human Services Data Specification.

Currently, this particular issue is isolated to when the keyword parameter is combined with the service_area parameter. The only time this would give more results than expected is when a Location has multiple services and where one service matches the keyword but not the service area. How often is that the case, and where the differing service areas are legitimate, as opposed to a data issue?

If it's a significant percentage across all Ohana datasets out there that warrants changing the search architecture for everyone using this project, then I'd be happy to review any pull requests.

For example, I did a search for "shelter" in Wahkiakum, and the first result that was "wrong" was the 12th one (most people only look at the first few results according to this usability study): WA Dept of VA - Seattle. It has a Service called "Information & Assistance" that includes all service areas, which is why this Location appeared in the results. All the services in this location have the same phone number, which according to the VA website is the King County Call Center. That looks to me to be a data issue rather than a search issue.

cderenburger commented 9 years ago

I went through my data this morning to see how many locations are affected. Of 9900 locations my data has 1046 locations (10.6%) where the service area differs in at least one service at each location.

I'll have to see if Excel will let me create a copy of a location for each different set of service areas it contains.

I am hoping to get some time to write up some advice on using Excel to convert data, handle fields larger than 255 characters, and export as proper UTF-8 to fit the fields required for Ohana-api. Would this be of interest to the project or would this beyond the scope of the docs?

monfresh commented 9 years ago

Any such insight and guides would be welcome. Feel free to create a new Wiki article. I believe Wiki editing is open to everyone.

For the search issue, I wouldn't spend too much time trying to examine or modify your data. I would first wait to see what kind of searches actual visitors are performing, and then you can have a better idea about the severity of the issue and whether or not your data needs to be improved. For example, if no one uses both the keyword and service area filters at the same time, then there's no need for you to waste time on this. Or, if people do use both filters, you can check to see which ones are the most popular, then examine the search results and see if any are "wrong" and whether or not they are due to "bad" data.