Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

Oscars pagination #679

Closed EmileSonneveld closed 4 months ago

EmileSonneveld commented 4 months ago

Bigger page size, check multiple pages, dedup for when running in auto tests. Only check next page when needed

EmileSonneveld commented 4 months ago

With start it seems to be sorted on properties>date. Not on the published date, is that stable enough? image

https://services.terrascope.be/catalogue//collections?startIndex=1&count=200&sortKeys=start

data = JSON.parse(document.querySelector("#jsonFormatterRaw > pre").innerText)
console.table(data["features"].map(x=>x["properties"]))
EmileSonneveld commented 4 months ago

I added the sort key anyway. There are multiple products with the same date tough. I guess that after that, it sorted based on when the product was added to the database.

bossie commented 4 months ago

I added the sort key anyway. There are multiple products with the same date tough. I guess that after that, it sorted based on when the product was added to the database.

It's probably me but I don't see how this list is sorted on date now.

You could go check with Stijn C. to see if there is a particular sortKeys that allows reliable paging or just take the pragmatic approach: remove the paging altogether and increase the page size to e.g. 500.

EmileSonneveld commented 4 months ago

Stijn C says that by default, features are sorted by ID: https://git.vito.be/projects/BIGGEO/repos/oscars/browse/src/main/java/be/vito/opensearch/elasticsearch/SearchOperation.java#85 To be sure I added a warning when it misbehaves. (Still running tests) If in any way, a feature got returned multiple times, the one on the last page would be preserved. I added a log for this, and it did not happen on CI at least