WCGA / West-Coast-Ocean-Data-Portal

bugs and fixes for the geoportal back end and UI front end of the WCODP
1 stars 0 forks source link

Add Sources names to search keywords text matching targets? #35

Open emiliom opened 9 years ago

emiliom commented 9 years ago

The "Search Keywords" box is apparently not hooked up to any of the "facet" collection tags (Categories, Issues or Sources). I've been thinking specifically that it'd be helpful to hook it up to Sources. Currently, if you type "IOOS", only 2 of the 9 records in the "West Coast IOOS" source are selected. Ditto for "NANOOS" (2 out of the 5 records in the "NANOOS" source are selected). From a user perspective, that behavior seems unexpected and undesirable.

Of course, it'd be great if every record in any give source also used the name of that source somewhere in the title, abstract or other attributes; then this would be a non-issue. But that requires asking every metadata provider to be very diligent, and can be a lot of work. Then again, one could also argue that the WCODP search should also be hitting the "keywords" actually placed in the metadata records (I don't believe they're being used, so in effect we're not leveraging that effort from providers), but that's another topic.

I have no idea how difficult this would be. And there's also the broader question of whether "Categories" and "Issues" should also be matched by the search box. I suspect that could get a lot trickier, though, given the long list of words (some very generic) used in those facets.

cybersea commented 9 years ago

I did some testing awhile ago, and I believe the search is actually using the formal keywords, but it doesn't seem to be using information in the Creator/Publisher/Contact entries.

It would be helpful to have some documentation clearly describing what components the search is using, so that we can guide are data partners when they are creating their metadata.

emiliom commented 9 years ago

Thanks, @cybersea ! I thought I had done some small tests regarding formal keywords (ie, keywords in the iso metadata), but that was over a year ago and my memory is hazy. It's great to learn that formal keywords in metadata records are used, even though they're not exposed directly to users on the WCODP.

Totally agree that it'd be very helpful to have this documented.

tchaddad commented 9 years ago

I'm not an expert, but I believe the Search box is hooked to a Solr index built upon the following fields:

In contrast, the browse-able facets in the UI are result of a manual tagging of the metadata records in the Geoportal backed. They are fast, because we also have Solr indexes built on the tags, but they are in a different index than the one mentioned above.

Interestingly, this helps you discover different problems with metadata content. For example, only one record that comes from NANOOS actually mentions NANOOS in the fields named above. Which is somewhat shocking because as source authors, and from a funding credit perspective alone, you would think at least a keyword would hit.

Conversely, there are a few additional records that mention NANOOS in their metadata content, that do not actually come from NANOOS as a source, which you discover when you search for the term NANOOS. This is also interesting, and is nice to see credit coming from others (I would hope).

I agree with the general point that maybe we should have search use both indexes, and we should ask P97 if there was a reason things were intentionally segregated. However, I do think if we do make this change, we will mask the current behavior, and that there is some value between exposing the difference between the source NANOOS and the term NANOOS, that will be lost.

Basically, the fix here could be to make search more global, but an additional fix could be to help contributors understand which fields are most important for discovery searching (as other systems like data.gov use the same metadata and you can't always change someone else's search system).

Food for thought...

cybersea commented 9 years ago

I'm chewing on your food for thought.....

For a simple addition, perhaps the Creator/Publisher fields could get added to the search. This is where the organizational association usually gets made, but I have actually tried to search those fields and it does not appear to be currently searching them.

From my understanding, this would not require searching of the browsable Facets (Location/Categories/Sources/Issues), because that information is directly from the metadata.

emiliom commented 9 years ago

I'm chewing on your food for thought.....

Me too. It'll take me a while to digest (sorry for the over-extension of the metaphor ...). @tchaddad, thanks so much for your terrific assessment and comments!

+1 on @cybersea's suggestions, too.

emiliom commented 9 years ago

Some comments on Tanya's food for thought and Allisons' additional thoughts.

I believe the Search box is hooked to a Solr index built upon the following fields:

  • Title, Abstract, Keywords
  • and a few other locations we should enumerate (all directly in the metadata XML contents)

A good first step would be to ask P97 to tell us what those additional metadata fields are (besides Title, Abstract, Keywords) and list them here for reference. I agree with Allison that the Creator/Publisher fields should be indexed, as according to her testing, they don't appear to be.

Tanya, great observations about the complementary value (and source) of the Search box vs the browse-able facets. I totally agree with you about the value of that distinction to WCODP managers/admins, and also in their/our work with data providers. We should strive not to lose that value. But I don't think end users (people looking for data) will or should are about that distinction; from their perspective, the Search box should be as all-encompassing as possible. Maybe, ideally, there could be a setting or less prominent switch that specifies whether the Search box only hits information indexed from the metadata records, vs that + the manual facets; but that's obviously more complicated, as new capability.

Thanks also for the observations and examples from NANOOS! I know how those things came about, but I won't clutter this discussion with the idiosyncratic details.

Finally:

Basically, the fix here could be to make search more global, but an additional fix could be to help contributors understand which fields are most important for discovery searching (as other systems like data.gov use the same metadata and you can't always change someone else's search system).

The "additional fix" should be part of our ongoing, never ending education to data contributors. But I like your emphasis on "which fields are most important for discovery searching", both within the WCODP and beyond.

tchaddad commented 9 years ago

Want to bump this one to remind ourselves to track down what search is currently based on, and look into how difficult it would be to add additional locations to that index...