gbif / portal16

GBIF.org website
https://www.gbif.org
Apache License 2.0
24 stars 15 forks source link

resource search sorting #570

Closed MortenHofft closed 6 years ago

MortenHofft commented 6 years ago

at time the sorting on resource search is slightly off

from website screen shot 2017-09-29 at 15 02 05

Identifying aquatic invertebrates using nex ... from Contentful

created
Yesterday, 3:38 PM
updated
Yesterday, 6:53 PM
published
Yesterday, 6:53 PM

Developing a common structural framework for d... from Contentful

created
4 hours ago
updated
8 minutes ago
published
8 minutes ago

elastic search query (showing the one from yesterday above the one from 4 hours ago) NB: Using "origin": "2017-09-29T15:13:00.196Z" instead of now/d returns the expected result

{
    "size": 2,
    "query": {
        "function_score": {
            "functions": [
                {
                    "gauss": {
                        "createdAt": {
                            "origin": "now/d",
                            "scale": "7d",
                            "decay": 0.75
                        }
                    },
                    "weight": 10
                }
            ],
            "query": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "searchable": "true"
                            }
                        },
                        {
                            "term": {
                                "contentType": "dataUse"
                            }
                        }
                    ]
                }
            }
        }
    }
}

Elastic search response (some keys removed)

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 49,
        "successful": 39,
        "failed": 10,
        "failures": [
            {
                "shard": 0,
                "index": "country",
                "node": "V-MvNfiRRViOgV_7aPTP5g",
                "reason": {
                    "type": "parsing_exception",
                    "reason": "unknown field [createdAt]",
                    "line": 1,
                    "col": 0
                }
            }
        ]
    },
    "hits": {
        "total": 227,
        "max_score": 0.13189553,
        "hits": [
            {
                "_index": "datause1506689838917",
                "_type": "content",
                "_id": "3gCOvQwphmS6kkKuWQKq2W",
                "_score": 0.13189553,
                "_source": {
                    ...
                    "title": {
                        "en-GB": "Identifying aquatic invertebrates using next generation sequencing"
                    },
                    "searchable": true,
                    "createdAt": "2017-09-28T14:37:39.196Z",
                    "contentType": "dataUse",
                    "homepage": true,
                    "updatedAt": "2017-09-28T16:53:50.631Z"
                }
            },
            {
                "_index": "datause1506689838917",
                "_type": "content",
                "_id": "3zGBIwOLH2QsM6IcUm0ecW",
                "_score": 0.13180104,
                "_source": {
                    "title": {
                        "en-GB": "Developing a common structural framework for data papers"
                    },
                    "searchable": true,
                    "createdAt": "2017-09-29T12:34:33.587Z",
                    "contentType": "dataUse",
                    "homepage": true,
                    "updatedAt": "2017-09-29T12:56:30.688Z"
                }
            }
        ]
    }
}

Elastic search full response

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 49,
        "successful": 39,
        "failed": 10,
        "failures": [
            {
                "shard": 0,
                "index": "country",
                "node": "V-MvNfiRRViOgV_7aPTP5g",
                "reason": {
                    "type": "parsing_exception",
                    "reason": "unknown field [createdAt]",
                    "line": 1,
                    "col": 0
                }
            }
        ]
    },
    "hits": {
        "total": 227,
        "max_score": 0.13189553,
        "hits": [
            {
                "_index": "datause1506689838917",
                "_type": "content",
                "_id": "3gCOvQwphmS6kkKuWQKq2W",
                "_score": 0.13189553,
                "_source": {
                    "summary": {
                        "en-GB": "Using large-scale DNA barcode sequencing to effectively identify species of samples archived for up to 12 years "
                    },
                    "primaryImage": {
                        "file": {
                            "en-GB": {
                                "url": "//images.contentful.com/uo17ejk9rkwj/3mBCu5D8mQ0GEMGsMu2Sm8/262e6612b61efd1d3ef962075b309241/original__2_.jpg",
                                "details": {
                                    "size": 927874,
                                    "image": {
                                        "width": 2048,
                                        "height": 1366
                                    }
                                },
                                "fileName": "original (2).jpg",
                                "contentType": "image/jpeg"
                            }
                        },
                        "description": {
                            "en-GB": "<a href=\"/occurrence/891101886\"><i>Xanthagrion erythroneurum</i></a> by Ry Beaver via iNaturalist. Photo licensed under <a href=\"http://creativecommons.org/licenses/by-nc/4.0/\">CC BY-NC 4.0</a>."
                        },
                        "title": {
                            "en-GB": "Xanthagrion erythroneurum"
                        }
                    },
                    "citation": "Carew ME, Metzeling L, St Clair R and Hoffmann AA (2017) Detecting invertebrate species in archived collections using next-generation sequencing. Molecular Ecology Resources. Wiley-Blackwell. Available at: https://doi.org/10.1111/1755-0998.12644.",
                    "topics": [
                        "ECOLOGY"
                    ],
                    "countriesOfResearcher": [
                        "AU"
                    ],
                    "title": {
                        "en-GB": "Identifying aquatic invertebrates using next generation sequencing"
                    },
                    "body": {
                        "en-GB": "In studies of aquatic invertebrate biodiversity, restraints on time and expenses often mean that communities are only identified to family levels. Archived samples stored in ethanol may, however, be identified more specifically at a later stage.\n\nThis study uses next-generation sequencing (NGS) of DNA barcodes to systematically identify species in archived macroinvertebrate samples from two sites in Australia, some of which had been stored up to 12 years at room temperature. Despite anticipated DNA degradation, the researchers were able to amplify partical DNA barcodes from most samples, and using these, identify the species often by more than one amplified sequence.\n\nNot all families had identifiable species, however, when the researchers compared the number of identified species per family with estimated equivalents in GBIF, they revealed potential gaps in the barcode library that could explain the lack of identified species.\n"
                    },
                    "type": "Entry",
                    "searchable": true,
                    "space": {
                        "sys": {
                            "type": "Link",
                            "linkType": "Space",
                            "id": "uo17ejk9rkwj"
                        }
                    },
                    "gbifRegion": [
                        "OCEANIA"
                    ],
                    "revision": 2,
                    "createdAt": "2017-09-28T14:37:39.196Z",
                    "countriesOfCoverage": [
                        "AU"
                    ],
                    "audiences": [
                        "GBIF_NETWORK",
                        "DATA_USERS"
                    ],
                    "primaryLink": {
                        "id": "19Fbu6S8gAE4s8YEsYGeES",
                        "label": {
                            "en-GB": "Original article (may require subscription)"
                        },
                        "url": {
                            "en-GB": "https://doi.org/10.1111/1755-0998.12644"
                        }
                    },
                    "id": "3gCOvQwphmS6kkKuWQKq2W",
                    "resourceUsed": "222 families",
                    "contentType": "dataUse",
                    "homepage": true,
                    "updatedAt": "2017-09-28T16:53:50.631Z"
                }
            },
            {
                "_index": "datause1506689838917",
                "_type": "content",
                "_id": "3zGBIwOLH2QsM6IcUm0ecW",
                "_score": 0.13180104,
                "_source": {
                    "summary": {
                        "en-GB": "Describing the structure of data papers through analysis of journals, their templates and guidelines"
                    },
                    "primaryImage": {
                        "file": {
                            "en-GB": {
                                "url": "//images.contentful.com/uo17ejk9rkwj/5SithJUWqs8QWeWYu6ci0o/2b02ececdb678e3559af46a535507dfa/gulls.png",
                                "details": {
                                    "size": 1527953,
                                    "image": {
                                        "width": 1500,
                                        "height": 500
                                    }
                                },
                                "fileName": "gulls.png",
                                "contentType": "image/png"
                            }
                        },
                        "description": {
                            "en-GB": "Map excerpt from <a href=\"https://doi.org/10.3897/zookeys.555.6173\">data paper</a> by Stienen et al. in Zookeys describing a <a href=\"/dataset/83e20573-f7dd-4852-9159-21566e1e691e\">bird tracking dataset</a>."
                        },
                        "title": {
                            "en-GB": "Map excerpt from data paper \"GPS tracking data of Lesser Black-backed Gulls and Herring Gulls breeding at the southern North Sea coast\" by Stienen et al published 2016 in Zookeys."
                        }
                    },
                    "citation": "Chen Y-N (2017) An analysis of characteristics and structures embedded in data papers: a preliminary study. Libellarium: journal for the research of writing, books, and cultural heritage institutions. University of Zadar 9(2). Available at: https://doi.org/10.15291/libellarium.v9i2.266.",
                    "topics": [
                        "DATA_PAPER"
                    ],
                    "countriesOfResearcher": [
                        "TW"
                    ],
                    "title": {
                        "en-GB": "Developing a common structural framework for data papers"
                    },
                    "body": {
                        "en-GB": "Many institutions, including GBIF, encourage sharing of research data through [data papers](/data-papers) that extend metadata in a way that mirrors the traditional scientific publication model. Dozens of journals now accept data papers, however, as highlighted by this study, they lack a common standard.\n\nBy reviewing submission templates and guidelines from 26 data journals, the authors describe a unifying framework consisting mainly of three distinct components: 1) basic information (e.g. title, abstract, author, etc.), 2) dataset descriptions (i.e. how was data collected, how it formatted, what does it cover, etc.), and 3) relationships (i.e. links and references). \n\nThe authors use the proposed framework to highlight the successful case of mapping metadata directly derived from the GBIF [Integrated Publishing Toolkit (IPT)](/ipt) onto a data paper template that can be edited and submitted to a journal for consideration.\n\nAs of September 2017, authors have contributed to more than [50 data papers describing datasets published in GBIF](/resource/search?contentType=literature&topics=DATA_PAPER&relevance=GBIF_PUBLISHED)."
                    },
                    "type": "Entry",
                    "searchable": true,
                    "space": {
                        "sys": {
                            "type": "Link",
                            "linkType": "Space",
                            "id": "uo17ejk9rkwj"
                        }
                    },
                    "gbifRegion": [
                        "ASIA"
                    ],
                    "revision": 2,
                    "createdAt": "2017-09-29T12:34:33.587Z",
                    "audiences": [
                        "GBIF_NETWORK",
                        "DATA_USERS",
                        "DATA_HOLDERS"
                    ],
                    "primaryLink": {
                        "id": "62GlLSdFN6YMSsMqIEiyIq",
                        "label": {
                            "en-GB": "Original article"
                        },
                        "url": {
                            "en-GB": "https://doi.org/10.15291/libellarium.v9i2.266"
                        }
                    },
                    "id": "3zGBIwOLH2QsM6IcUm0ecW",
                    "contentType": "dataUse",
                    "homepage": true,
                    "updatedAt": "2017-09-29T12:56:30.688Z"
                }
            }
        ]
    }
}
MortenHofft commented 6 years ago

Using "origin": "2017-09-29T15:13:00.196Z" returns the expected result

The ES timestamp says "createdAt": "2017-09-29T12:34:33.587Z" (It is now 15:20 in DK) The Contentful interface says 4 hours ago ≈ 11.20 DK time

MortenHofft commented 6 years ago

Looks like an issue of the query being now/d which is just the day, not the full time

MortenHofft commented 6 years ago

closed by https://github.com/gbif/portal16/blob/fe03cae5dcbb4ecc79f266e5d435a32fb57e4da7/app/controllers/api/resource/search/resourceSearch.js#L204