Psychoanalytic-Electronic-Publishing / OpenPubArchive-Content-Server

A document server with an open API for which you can build client apps on top of, that can serve journal, book, or video content, where all this content is in XML and can be served out in XML, HTML, PDF, or EPub per the client.
Apache License 2.0
2 stars 3 forks source link

Search sorting #63

Closed bakerac4 closed 3 years ago

bakerac4 commented 3 years ago

Hi @nrshapiro,

I know you mentioned adding in the sort keywords for the search. While testing these, some seem to work but some dont.

For example, https://stage-api.pep-web.rocks/v2/Database/Search/?abstract=true&facetfields=art_year_int%2Cart_views_last12mos%2Cart_cited_5%2Cart_authors%2Cart_lang%2Cart_type%2Cart_sourcetype%2Cart_sourcetitleabbr%2Cglossary_group_terms%2Cart_kwds_str&facetlimit=15&facetmincount=1&highlightlimit=5&limit=20&offset=20&sort=title&synonyms=false&title=test

That should sort by title, but Im seeing the first result title as International Journal of Psychoanalysis, the second title as The Nagoya Journal of Medical Science. (Nagoya University School of Medicine, Nagoya, Japan.) XVII, 1954. and the third title as Bulletin of the Menninger Clinic. XV, 1951

I was just hoping you could take a look and ensure the sorting is correct for all the params.

I also had a question on how to reverse the order. Normally we would provide sort=year and sort=-year to handle ASC and DESC. However when I pass in -year I get a 400 error. How do I reverse the sort for year?

nrshapiro commented 3 years ago

@bakerac4

I see--the reason some / all don't work is that the library I use, solrpy, is screwing with us by separating out sort_direction...which is only supposed to kick in if we don't specify sort direction in the main field (which I do). I'm looking into it now.

nrshapiro commented 3 years ago

@bakerac4

Ok, it wasn't that at all--it was that last night I changed my data source to working off of the stage database, and apparently, my IP is blacklisted, so nothing works remotely from there right now. It looked like things weren't working for sort, but actually, nothing was working! But it was just a misdirection in the solrpy error messages.

So after that, here's the simple answer to your questions:

Templates below:

SORT_BIBLIOGRAPHIC = "art_authors_mast {0}, art_year {0}, art_title {0}"
SORT_YEAR = "art_year {0}"
SORT_AUTHOR = "art_authors_mast {0}"
SORT_TITLE = "art_title {0}"
SORT_SOURCE = "art_sourcetitlefull {0}"
SORT_CITATIONS = "art_cited_5 {0}"
SORT_VIEWS = "art_views_last6mos {0}"
SORT_TOC = "art_sourcetitleabbr {0}, art_year {0}, art_iss {0}, art_pgrg {0}"
SORT_SCORE = "score {0}"

# Keys for above
PREDEFINED_SORTS = {
    "bibliographic": (SORT_BIBLIOGRAPHIC, "asc"),
    "year":(SORT_YEAR, "desc"),
    "author":(SORT_AUTHOR, "asc"),
    "title":(SORT_TITLE, "asc"),
    "source":(SORT_SOURCE, "asc"),
    "citations":(SORT_CITATIONS, "desc"),
    "views":(SORT_VIEWS, "desc"),
    "toc":(SORT_TOC, "asc"),
    "score":(SORT_SCORE, "desc"),
    # legacy/historical naming for sorts
    "citecount":(SORT_CITATIONS, "desc"), 
    "rank":(SORT_SCORE, "desc"), 
    }
nrshapiro commented 3 years ago

Some documentation. Test results in the Solr console:

{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{
      "q":"text:phleb*",
      "fl":"title",
      "fq":"art_level:1 && art_year_int:[1980 TO 1989] || art_year_int:[1930 TO 1939]",
      "sort":"title asc",
      "_":"1601043431171"}},
  "response":{"numFound":9,"start":0,"numFoundExact":true,"docs":[
      {
        "title":"1. Irma at the Window: The Fourth Script of Freud's Specimen Dream"},
      {
        "title":"A Reappraisal of the Emma Episode and the Specimen Dream"},
      {
        "title":"A Farewell to Freud's Interpretation of Dreams"},
      {
        "title":"Freud's Self-Analysis: Translated from the French by Peter Graham.  With a Preface by M. Masud R. Khan"},
      {
        "title":"When the Analyst is Chronically Ill or Dying"},
      {
        "title":"Freud's Irma Dream and the Origins of Psychoanalysis"},
      {
        "title":"Proust's Myth of Artistic Creation"},
      {
        "title":"Can Psychoanalytic Theory be Cogently Tested “On the Couch”?: Part II"},
      {
        "title":" Book Notices"}]
  }}

Reversing direction:

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"text:phleb*",
      "fl":"title",
      "fq":"art_level:1 && art_year_int:[1980 TO 1989] || art_year_int:[1930 TO 1939]",
      "sort":"title desc",
      "_":"1601043431171"}},
  "response":{"numFound":9,"start":0,"numFoundExact":true,"docs":[
      {
        "title":" Book Notices"},
      {
        "title":"Can Psychoanalytic Theory be Cogently Tested “On the Couch”?: Part II"},
      {
        "title":"Proust's Myth of Artistic Creation"},
      {
        "title":"Freud's Irma Dream and the Origins of Psychoanalysis"},
      {
        "title":"When the Analyst is Chronically Ill or Dying"},
      {
        "title":"A Reappraisal of the Emma Episode and the Specimen Dream"},
      {
        "title":"A Farewell to Freud's Interpretation of Dreams"},
      {
        "title":"Freud's Self-Analysis: Translated from the French by Peter Graham.  With a Preface by M. Masud R. Khan"},
      {
        "title":"1. Irma at the Window: The Fourth Script of Freud's Specimen Dream"}]
  }}
nrshapiro commented 3 years ago

@bakerac4 I've found the only workaround for the solr sorting...for fields that are not string already, I'm going to add a copyfield which is string, suffixed with _str. So we'll be able to sort by title_str (or art_title_str, I maintain both just for consistency) and it will sort properly, assuming Solr handles the case properly, otherwise I'll deal with that.

nrshapiro commented 3 years ago

This workaround is now implemented in the code from 2020.1003.1. Tested locally but waiting for stage to finish the schema update. Note the template definition changes below.

2020.1003.1

e.g., bibliographic asc or bibliographic desc

PREDEFINED_SORTS = {
    "bibliographic": (SORT_BIBLIOGRAPHIC, "asc"),
    "year":(SORT_YEAR, "desc"),
    "author":(SORT_AUTHOR, "asc"),
    "title":(SORT_TITLE, "asc"),
    "source":(SORT_SOURCE, "asc"),
    "citations":(SORT_CITATIONS, "desc"),
    "views":(SORT_VIEWS, "desc"),
    "toc":(SORT_TOC, "asc"),
    "score":(SORT_SCORE, "desc"),
    # legacy/historical naming for sorts
    "citecount":(SORT_CITATIONS, "desc"), 
    "rank":(SORT_SCORE, "desc"), 
    }
nrshapiro commented 3 years ago

@bakerac4

The new schema and database update is online on Stage and sort worked in a quick test online in the /Docs interface. Can't see it really this way, but this was the URL that it generated:

https://stage-api.pep-web.rocks/v2/Database/Search/?fulltext1=philanthropist&viewperiod=4&formatrequested=HTML&highlightlimit=5&facetmincount=1&facetlimit=15&sort=bibliographic%20asc&limit=15