PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.65k stars 1.65k forks source link

Client functions have limit even when set to None #15963

Open mitchell-lawson opened 3 weeks ago

mitchell-lawson commented 3 weeks ago

Bug summary

When interacting with the prefect client, documentation does not clarify that you must paginate via the offset parameter even when limit is set to None. The limit in reality is 200, even when set to None. Our environment has >2000 deployments and I ran the code below to find out this limit. This is present in all client functions that have the limit parameter.  

from prefect import get_client
from asyncio import run

async def main():
    client =  get_client()
    deployments = await client.read_deployments(limit=None)

    assert len(deployments) == 200 # passes
    assert len(deployments) > 200 # fails

if __name__ == "__main__":
    run(main())

Version info

Version:             2.19.3
API version:         0.8.4
Python version:      3.11.9
Git commit:          ce378efe
Built:               Thu, May 30, 2024 11:59 AM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         server

There is also no reference to this in prefect 3.x

Additional context

From the python docs:

    async def read_deployments(
        self,
        *,
        flow_filter: FlowFilter = None,
        flow_run_filter: FlowRunFilter = None,
        task_run_filter: TaskRunFilter = None,
        deployment_filter: DeploymentFilter = None,
        work_pool_filter: WorkPoolFilter = None,
        work_queue_filter: WorkQueueFilter = None,
        limit: int = None,
        sort: DeploymentSort = None,
        offset: int = 0,
    ) -> List[DeploymentResponse]:
        """
        Query the Prefect API for deployments. Only deployments matching all
        the provided criteria will be returned.

        Args:
            flow_filter: filter criteria for flows
            flow_run_filter: filter criteria for flow runs
            task_run_filter: filter criteria for task runs
            deployment_filter: filter criteria for deployments
            work_pool_filter: filter criteria for work pools
            work_queue_filter: filter criteria for work pool queues
            limit: a limit for the deployment query
            offset: an offset for the deployment query

        Returns:
            a list of Deployment model representations
                of the deployments
        """

There is no additional info in the online docs either.

cicdw commented 3 weeks ago

Hi @mitchell-lawson - thank you for the issue! This is really confusing, we will update.

In the meantime for your understanding: the default limit is a Prefect setting on your server / API that can be configured through PREFECT_API_DEFAULT_LIMIT (which defaults to 200). Just remember that you need to set this setting on the server process, not the client process, in order for it to be picked up.