NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

Pagination not incrementing limit value for pds-deep-registry-archive #140

Closed jordanpadams closed 1 year ago

jordanpadams commented 1 year ago

๐Ÿ› Describe the bug

When executing for >1 page, the limit value does not appear to be increasing.

๐Ÿ“œ To Reproduce

$ pds-deep-registry-archive --url https://pds.nasa.gov/api/search/1.0 --debug --site PDS_ATM urn:nasa:pds:mer1_navcam_sci_calibrated::1.0
INFO ๐Ÿ‘Ÿ PDS Deep Registry-based Archive, version 1.1.2
DEBUG ๐Ÿ’ข command line args = Namespace(loglevel=10, include_latest_collection_only=False, url='https://pds.nasa.gov/api/search/1.0', site='PDS_ATM', bundle='urn:nasa:pds:mer1_navcam_sci_calibrated::1.0')
DEBUG ๐Ÿค” Comprehending the registry at https://pds.nasa.gov/api/search/1.0 for urn:nasa:pds:mer1_navcam_sci_calibrated::1.0
DEBUG โš™๏ธ Asking ``bundle_by_lidvid`` for urn:nasa:pds:mer1_navcam_sci_calibrated::1.0
DEBUG โš™๏ธ Asking ``collections_of_a_bundle`` for urn:nasa:pds:mer1_navcam_sci_calibrated::1.0 at 0 limit 50
DEBUG โš™๏ธ Asking ``products_of_a_collection`` for urn:nasa:pds:mer1_navcam_sci_calibrated:data::1.0 at 0 limit 50
DEBUG โš™๏ธ Asking ``products_of_a_collection`` for urn:nasa:pds:mer1_navcam_sci_calibrated:data::1.0 at 50 limit 50
DEBUG โš™๏ธ Asking ``products_of_a_collection`` for urn:nasa:pds:mer1_navcam_sci_calibrated:data::1.0 at 100 limit 50
CRITICAL ๐Ÿšจ The server at https://pds.nasa.gov/api/search/1.0 gave an INTERNAL SERVER ERROR; you should contact its administrator if you can figure out who that is. The following information may be helpful to them in figuring out the issue: ยซ'{"timestamp":1674229416025,"status":500,"error":"Internal Server Error","path":"/collections/urn%3Anasa%3Apds%3Amer1_navcam_sci_calibrated%3Adata%3A%3A1.0/products"}'ยป
INFO ๐Ÿ‘‹ Thanks for using this program! Bye!

๐Ÿ•ต๏ธ Expected behavior

Limit value increases and we are able to paginate through all the data in the collections

๐Ÿ“š Version of Software Used

1.2.0-dev

๐Ÿฉบ Test Data / Additional context

See command-line execution above

nutjob4life commented 1 year ago

Wait a minute @jordanpadams, limit is the end index, not a count?

I thought it was to limit the number of items being returned, you know, like LIMIT in SQL.

Even the documentation says limit is "maximum number of matching results returned, for pagination".

Help!

jordanpadams commented 1 year ago

@nutjob4life I thought this was maybe a bug in pds-deep-archive but looks like this is a bug in the API

nutjob4life commented 1 year ago

@jordanpadams could be both ๐Ÿ˜ฑ

jordanpadams commented 1 year ago

๐Ÿ˜จ

nutjob4life commented 1 year ago

Okay @jordanpadams I'm digging deeper (appropriate for the deep archive) into this.

First off, the documentation does clearly say that the limit parameter tells you the maximum number of matching results to be returned, and you can use that along with start to paginate through the results.

And that's how the Registry version of the Deep Archive historically worked.

Now suppose limit has somehow changed from being a count into an end index. If I say "give me start at 200 and end (limit) at 50", then the server should give back 400 Bad Request, not 500 Internal Server Error like it's doing now.

So I started playing with this URL:

https://pds.nasa.gov/api/search/1.0/collections/urn:nasa:pds:mer1_navcam_sci_calibrated:data::1.0/products?start=START&limit=LIMIT&fields=ops:Data_File_Info.ops:file_ref,ops:Data_File_Info.ops:md5_checksum,ops:Label_File_Info.ops:file_ref,ops:Label_File_Info.ops:md5_checksum

and trying different values for START and LIMIT.

Limit as an End Index

If limit contradicts the documentation and is indeed the ending index of the data you want to retrieve, then here's what I expect and get:

Start Limit Expected Actual
0 50 50 50
50 100 50 100
100 150 50 150
150 200 50 200
200 250 50 250
250 300 50 300

The length of the data key in the returned JSON is only correct the first time. After that, it's certainly is behaving like it's a count and not an end index.

Well, that jives with the documentation at least! So, let's try again, treating limit like a count.

Limit as a Count

Treating limit as a count and substituting values in the URL above, I get:

Start Limit Expected Actual
0 50 50 50
50 50 50 50
100 50 50 โŒ

The โŒ represents 500 Internal Server Error.

Let's try a smaller limit of 25:

Start Limit Expected Actual
0 25 25 25
25 25 25 25
50 25 25 โŒ

Does limit = 1 work?

Start Limit Expected Actual
0 1 1 1
1 1 1 1
2 1 1 โŒ

Of course, if I do start = 2 and limit = 2, then I get back not 1 but 2 items, which is still wrong.

So for at least collections/{identifier}/products, limit is neither an and index nor a count. It's just broken!

jordanpadams commented 1 year ago

thanks @nutjob4life this is great info for an API bug

nutjob4life commented 1 year ago

@jordanpadams I'll file it over on registry-api!

Nevermind; it looks like you've mentioned this in relation to another issue.

I'll hold off for now.

nutjob4life commented 1 year ago

Okay, filed in registry-api.

I recommend closing this issue for now. If registry-api changes behavior, then we'll have to revisit pagination here in deep-archive.

jordanpadams commented 1 year ago

๐Ÿ‘

jordanpadams commented 1 year ago

closing for now in lieu of https://github.com/NASA-PDS/registry-api/issues/240

alexdunnjpl commented 1 year ago

API hasn't changed stated behaviour, but is subject to a bugfix (and another one once I sort it out next week) so an update may be required for deep-archive (dunno - just wanted to flag it)

@nutjob4life

nutjob4life commented 1 year ago

Thanks @alexdunnjpl. Don't worry, I'm accustomed to Registry's API being about as stable as a stool with two legs ๐Ÿ˜