Sorting by number of records for a given dataset or publisher

gbif / content-crawler

Crawls CMS and articles from Mendeley into ElasticSearch indexes

Apache License 2.0

1 stars 1 forks source link

I'm sure this would be insanely expensive, but each literature entry could have "derived dataset"-like metadata, adding counts and perhaps fractions to the metadata, e.g.

from

"gbifDatasetKey": [
                        "4fa7b334-ce0d-4e88-aaae-2e0c138d049e",
                        "38b4c89f-584c-41bb-bd8f-cd1def33e92f",
                        "8a863029-f435-446a-821e-275f4f641165",
etc.

 {
    "gbifDatasetKey": {
        "4fa7b334-ce0d-4e88-aaae-2e0c138d049e": {
            "count": 67045764,
            "fraction": 0.693
        },
        "3b894fe4-c13c-4a04-b372-4e749ce102e1": {
            "count": 5753111,
            "fraction": 0.0594
        },
        "8a863029-f435-446a-821e-275f4f641165": {
            "count": 3107077,
            "fraction": 0.0321
        },
    }
}

this would then also have to be done by publisher... 🤯

gbif / content-crawler

Sorting by number of records for a given dataset or publisher #58