USDA / USDA-APIs

Do you have feedback, ideas, or questions for USDA APIs? Use this repository's Issue Tracker to join the discussion.
www.usda.gov/developer
107 stars 16 forks source link

sortDirection doesn't seem to work #83

Closed SteveCEms closed 4 years ago

SteveCEms commented 4 years ago

I can't seem to get sortDirection=desc to work in a get with either asc or desc. How does it determine the sort direction since it seems to search every field.. Try searching for pepsi with either sortDirection and the first item description is SODASTREAM CAPS, PEPSI SODA MIX in both cases.

Here is my get: https://api.nal.usda.gov/fdc/v1/search?api_key=myapikey&generalSearchInput=pepsi&sortDirection=desc&includeDataTypeList=Branded&requireAllWords=true&pageNumber=1

littlebunch commented 4 years ago

@SteveCEms I believe you have to specify a sortField. From the documentation:

The name of the field by which to sort. Possible sorting options: lowercaseDescription.keyword, dataType.keyword, publishedDate, fdcId.

I've requested that sort fields match document field names as it's not reasonable to expect users to know internal fields used for sorting, e.g. lowercaseDescription.keyword. Until that changes, you'll need to use the fields listed in the documentation

SteveCEms commented 4 years ago

Thanks, that seems to work now. I tried the sort fields a good while ago and it didn't seem to work. Now it sorts just fine.

SteveCEms commented 4 years ago

I need to make an observation on sorting alphabetically. Searching the FDC database seems to be over many of the possible fields. The ingredients list is one of those fields unfortunately. If you search for gum, for example, there are over 61,000 results, most because gum acacia etc. is used in about 60,000 food items. If you sort alphabetically, the first result is: !AJUA!, CAFFEINE FREE SODA, MANDARIN ORANGE, fdcId = 412617 because gum acacia is in the ingredients list. The sort is done after the search is done. None of the first 50 items contain gum in the description in the sorted list. However, if you don't sort, then the first 50 food items all contain gum because of sorting by relevance.

I've decided to not sort at all for the search. Then I sort in my program on the returned sort results. I then get a sorted list of the most relevant search results. I believe a Lucene score is calculated for each result and then sorted by that score.

I would prefer not sorting on the ingredients list at all, but I can't seem to find any way of doing that.

littlebunch commented 4 years ago

@SteveCEms You can limit your search to a particular field which, in your case, would seem to be "description": curl -H "Content-type:application/json" -XPOST https://api.nal.usda.gov/fdc/v1/search?api_key="DEMO_KEY" -d '{"generalSearchInput":"description:gum","includeDataTypeList":["Branded"],"pageNumber":1,"sortField":"lowercaseDescription.keyword"}' which will return sorted lists of foods containing "gum" in the description.

Stepping onto my soapbox: the power of Lucene -- or any vector space engine -- is its relevance ranking capabilities. It's always seemed to me that sorting on anything other than relevance defeats the purpose of the search engine. Stepping off my soapbox.

SteveCEms commented 4 years ago

Thanks littlebunch, I now get a much smaller set of foods by restricting the search to just "description". This is now my url for searching the FDC database:

url = "https://api.nal.usda.gov/fdc/v1/search?api_key=" + usdaKey + "&" + "generalSearchInput=description:" + searchterms + "&" + "includeDataTypeList=Branded" + "&" +
"requireAllWords=true" + "&" +
"pageNumber=" + pagenumber;

I have to agree with your soapbox that a Lucene search gives better results.