Open yemoski opened 1 year ago
I'm not sure what, if anything, we can do about this.
There's the 'obsoletes' field, but none of the DOIs that show up in there are in this result set, which means that the obsoletedBy clause is doing its job.
The three results with the title " Historic air temperatures in Alaska for 1901-2015, with spatial subsetting by region" are a good example of this.
They have three separate DOIs, but if you go to their landing pages, two have a pointer to the most recent one with the text " A newer version of this dataset exists. View it now."
the way DataONE solves this is to differentiate the Persistent Identifier (PID) that maps to a specific content-immutable version of a file or package, and the Series Identifier (SID) that maps to the most recent version in a chain of versions. There are more details in the DataONE API docs.
When they harvest from a SO provider, they checksum the canonicalized version of the JSON-LD as the PID, and use the provided dc:identifier as the SID. When the repository modifies a record, that results in a new checksum (and a new PID), and they then update the SID to point at that most recent version. This allows them to maintain version history of all objects from the schema.org harvests, while also directing search results to only the most recent published version.
(via Matt Jones)
That solution from Matt went a long way but didn't fix 100% of the cases.
From Matt Jones: "one easy way to do this is to add +AND+-obsoletedBy:* to your solr query"
Another search, from someone who wrote in:
When I enter the search term “Andrill” I’m returned 122 results. The first page shows the first 50 results, but I can’t see the option to tab to the next page of results.
I’ve also noticed that there are duplicate results being returned. I expected this search to return 91 results (my understanding is that all records from this project are held at Pangaea, but I could be wrong).
Search for 'temperature' to see what I mean.