Description: I've encountered an issue with the Marquez dataset versions API where not all dataset versions are returned, even when the limit parameter is set higher than the total number of versions.
Steps to Reproduce:
Prepare Data: Download the dump and initialize my database(Its 130 KB) https://drive.google.com/file/d/1T8LI-NRHg7Qxj_pi7CN0sRm0ZcssooxU/view
API Request: Use the /api/v1/namespaces/s3a%3A%2F%2Fproduct-data/datasets/%2F4f5e4a74-d608-48b9-968b-b638ff80654f/versions
Set Limit: Set the limit parameter to value 25, 100 and 1000. The returned list sizes will be 1, 3 and 6 respectively while the totalCount property is always 6.
Notice that the API returns fewer versions than expected.
Expected Behavior:
The API should return all dataset versions up to the specified limit. If the limit exceeds the total number of versions, all versions should be returned.
Actual Behavior:
The API returns fewer versions than expected, and the number of versions returned does not match the total count, even when the limit is sufficiently high.
Cause:
The issue is due to the placement of the LIMIT and OFFSET clauses within the SQL query used in the DatasetVersionDao.findAll method. The LIMIT and OFFSET are applied within a Common Table Expression (CTE) before grouping and filtering, leading to inconsistent results.
I am going to open a PR to fix the placement according to your guidelines.
Description: I've encountered an issue with the Marquez dataset versions API where not all dataset versions are returned, even when the limit parameter is set higher than the total number of versions.
Steps to Reproduce:
Prepare Data: Download the dump and initialize my database(Its 130 KB) https://drive.google.com/file/d/1T8LI-NRHg7Qxj_pi7CN0sRm0ZcssooxU/view API Request: Use the /api/v1/namespaces/s3a%3A%2F%2Fproduct-data/datasets/%2F4f5e4a74-d608-48b9-968b-b638ff80654f/versions Set Limit: Set the limit parameter to value 25, 100 and 1000. The returned list sizes will be 1, 3 and 6 respectively while the totalCount property is always 6. Notice that the API returns fewer versions than expected.
Expected Behavior:
The API should return all dataset versions up to the specified limit. If the limit exceeds the total number of versions, all versions should be returned.
Actual Behavior:
The API returns fewer versions than expected, and the number of versions returned does not match the total count, even when the limit is sufficiently high.
Cause:
The issue is due to the placement of the LIMIT and OFFSET clauses within the SQL query used in the DatasetVersionDao.findAll method. The LIMIT and OFFSET are applied within a Common Table Expression (CTE) before grouping and filtering, leading to inconsistent results.
I am going to open a PR to fix the placement according to your guidelines.