geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
426 stars 489 forks source link

CSW GetRecords not pageable #577

Closed davidread closed 10 years ago

davidread commented 10 years ago

A client cannot easily use the GetRecords paging system, because order of the results in those pages is not the same between identical requests.

See the question on the list and a response: http://sourceforge.net/p/geonetwork/mailman/message/32520959/

The suggested solution is to order results by the order they are stored, when no other sort order or search terms are supplied. (Because 'relevance' is meaningless in this case.)

fxprunayre commented 10 years ago

Default sort order in GeoNetwork is based on Lucene relevance. If you don't specify any search criteria (like in your example), then the order documents have been added to Lucene index is used (AFAIK) and if you define search criteria then relevance is computed. BTW relevance should not changed AFA the catalog content does not change, so order should remain the same I think.

So far tested on my side on branch stable-develop, I don't have the issue.

curl -s 'http://localhost:8080/geonetwork/srv/eng/csw?request=GetRecords&service=CSW&version=2.0.2&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1' | grep dc:identifier
      <dc:identifier>d49345d5-f068-43fb-8713-b626584e5572</dc:identifier>
      <dc:identifier>53329f0b-d4cf-4804-9cb3-92d815f6b2ca</dc:identifier>
      <dc:identifier>a61eb807-ed95-4f6e-95c1-c9d0ca001278</dc:identifier>
      <dc:identifier>c74c4ae9-0581-4557-abba-9c24b51a174d</dc:identifier>
      <dc:identifier>60bce4db-336c-4c4f-ba7b-5b143853ca0c</dc:identifier>
      <dc:identifier>8642eac4-c71e-4a6b-b05a-b2d5737e1e51</dc:identifier>
      <dc:identifier>c58462c8-21b6-45d2-81cc-f31810fd2272</dc:identifier>
      <dc:identifier>aeb02d8a-a91a-46de-8fdc-2a48df7b506f</dc:identifier>
      <dc:identifier>45ac917f-a50c-4c2e-959f-2f0f39c00e40</dc:identifier>
      <dc:identifier>1f537e1c-24f9-4740-8227-73af94759572</dc:identifier>
curl -s 'http://localhost:8080/geonetwork/srv/eng/csw?request=GetRecords&service=CSW&version=2.0.2&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=1' | grep dc:identifier
      <dc:identifier>d49345d5-f068-43fb-8713-b626584e5572</dc:identifier>
      <dc:identifier>53329f0b-d4cf-4804-9cb3-92d815f6b2ca</dc:identifier>
      <dc:identifier>a61eb807-ed95-4f6e-95c1-c9d0ca001278</dc:identifier>
      <dc:identifier>c74c4ae9-0581-4557-abba-9c24b51a174d</dc:identifier>
      <dc:identifier>60bce4db-336c-4c4f-ba7b-5b143853ca0c</dc:identifier>
      <dc:identifier>8642eac4-c71e-4a6b-b05a-b2d5737e1e51</dc:identifier>
      <dc:identifier>c58462c8-21b6-45d2-81cc-f31810fd2272</dc:identifier>
      <dc:identifier>aeb02d8a-a91a-46de-8fdc-2a48df7b506f</dc:identifier>
      <dc:identifier>45ac917f-a50c-4c2e-959f-2f0f39c00e40</dc:identifier>
      <dc:identifier>1f537e1c-24f9-4740-8227-73af94759572</dc:identifier>

and then at start position = 100

curl -s 'http://localhost:8080/geonetwork/srv/eng/csw?request=GetRecords&service=CSW&version=2.0.2&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=100' | grep dc:identifier
      <dc:identifier>/96651</dc:identifier>
      <dc:identifier>ec59ce02-9f55-43c4-81eb-aa018664ce33</dc:identifier>
      <dc:identifier>42c46494-a53a-4ce6-bf49-d282b8efeffb</dc:identifier>
      <dc:identifier>8dc20420-037e-11e0-abec-005056987263</dc:identifier>
      <dc:identifier>9d50c041-64b3-435c-a171-4b136ba0fbff</dc:identifier>
      <dc:identifier>SDN:CPRD:353:291003</dc:identifier>
      <dc:identifier>0dba300d-1a7d-42b5-8ba6-f655bbf37998</dc:identifier>
      <dc:identifier>SDN:CPRD:145:DTM_CNR-ISMAR-22</dc:identifier>
      <dc:identifier>73dcbed3-cb76-456d-bd92-d4bd6167a510</dc:identifier>
      <dc:identifier>8d0ba420-3343-11df-bae0-005056981ded</dc:identifier>
curl -s 'http://localhost:8080/geonetwork/srv/eng/csw?request=GetRecords&service=CSW&version=2.0.2&constraintLanguage=CQL_TEXT&typeNames=csw%3ARecord&resultType=results&startposition=100' | grep dc:identifier
      <dc:identifier>/96651</dc:identifier>
      <dc:identifier>ec59ce02-9f55-43c4-81eb-aa018664ce33</dc:identifier>
      <dc:identifier>42c46494-a53a-4ce6-bf49-d282b8efeffb</dc:identifier>
      <dc:identifier>8dc20420-037e-11e0-abec-005056987263</dc:identifier>
      <dc:identifier>9d50c041-64b3-435c-a171-4b136ba0fbff</dc:identifier>
      <dc:identifier>SDN:CPRD:353:291003</dc:identifier>
      <dc:identifier>0dba300d-1a7d-42b5-8ba6-f655bbf37998</dc:identifier>
      <dc:identifier>SDN:CPRD:145:DTM_CNR-ISMAR-22</dc:identifier>
      <dc:identifier>73dcbed3-cb76-456d-bd92-d4bd6167a510</dc:identifier>
      <dc:identifier>8d0ba420-3343-11df-bae0-005056981ded</dc:identifier>

what could explain the change in the order is if new records are added at the same time.

davidread commented 10 years ago

I found a couple of comments on lucene's behaviour:

It does suggest that Lucene should have repeatable results. In my example request, relevancy scores will be even so it will revert to ordering by docId "to be repeatable". So as you allude, this shouldn't happen. I'm taking this back to the site's administrator, as it seems more likely they have some problematic load-balanced servers or something.

Thanks for your help.