boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.47k stars 1.08k forks source link

Operations/methods missing paginators #1462

Closed kapilt closed 1 year ago

kapilt commented 6 years ago

I brought this up at the openspace at pycon, there are lots of client methods missing paginator metadata. I went ahead and coded up simple script to identify all the missing paginators in botocore json sdk metadata.

import boto3
import yaml

def main():
    session = boto3.Session()
    services = session.get_available_services()
    op_missing_paginator = {}
    for s in services:
        c = session.client(s)
        op_to_py = {v: k for k, v in c._PY_TO_OP_NAME.items()}
        for op in c.meta.service_model.operation_names:
            opm = c.meta.service_model.operation_model(op)
            if opm.output_shape is None:
                continue
            if 'NextToken' not in opm.output_shape.members:
                continue
            py_name = op_to_py[opm.name]
            if not c.can_paginate(py_name):
                op_missing_paginator.setdefault(s, []).append(py_name)
    print(yaml.safe_dump(op_missing_paginator, default_flow_style=False))

Which results in the following output

acm-pca:
- list_certificate_authorities
- list_tags
alexaforbusiness:
- list_device_events
- search_address_books
- search_contacts
application-autoscaling:
- describe_scheduled_actions
appstream:
- describe_directory_configs
- describe_fleets
- describe_image_builders
- describe_sessions
- describe_stacks
- list_associated_fleets
- list_associated_stacks
autoscaling:
- describe_load_balancer_target_groups
- describe_load_balancers
autoscaling-plans:
- describe_scaling_plan_resources
- describe_scaling_plans
budgets:
- describe_budgets
- describe_notifications_for_budget
- describe_subscribers_for_notification
clouddirectory:
- list_incoming_typed_links
- list_object_children
- list_object_parents
- list_outgoing_typed_links
cloudformation:
- describe_account_limits
- describe_change_set
- list_change_sets
- list_stack_instances
- list_stack_set_operation_results
- list_stack_set_operations
- list_stack_sets
cloudhsm:
- list_hapgs
- list_hsms
- list_luna_clients
cloudtrail:
- list_public_keys
- list_tags
cloudwatch:
- get_metric_data
codepipeline:
- list_webhooks
cognito-identity:
- list_identities
- list_identity_pools
- lookup_developer_identity
cognito-idp:
- admin_list_groups_for_user
- admin_list_user_auth_events
- list_groups
- list_identity_providers
- list_resource_servers
- list_user_pool_clients
- list_user_pools
- list_users_in_group
cognito-sync:
- list_datasets
- list_identity_pool_usage
- list_records
config:
- describe_aggregate_compliance_by_config_rules
- describe_aggregation_authorizations
- describe_config_rule_evaluation_status
- describe_configuration_aggregator_sources_status
- describe_configuration_aggregators
- describe_pending_aggregation_requests
- get_aggregate_compliance_details_by_config_rule
- get_aggregate_config_rule_compliance_summary
dax:
- describe_clusters
- describe_default_parameters
- describe_events
- describe_parameter_groups
- describe_parameters
- describe_subnet_groups
- list_tags
ds:
- describe_directories
- describe_snapshots
- describe_trusts
- list_ip_routes
- list_schema_extensions
- list_tags_for_resource
dynamodb:
- list_tags_of_resource
ec2:
- describe_classic_link_instances
- describe_egress_only_internet_gateways
- describe_elastic_gpus
- describe_fleet_history
- describe_fleet_instances
- describe_fleets
- describe_flow_logs
- describe_fpga_images
- describe_host_reservation_offerings
- describe_host_reservations
- describe_hosts
- describe_import_image_tasks
- describe_import_snapshot_tasks
- describe_instance_credit_specifications
- describe_launch_template_versions
- describe_launch_templates
- describe_moving_addresses
- describe_network_interface_permissions
- describe_prefix_lists
- describe_principal_id_format
- describe_scheduled_instance_availability
- describe_scheduled_instances
- describe_spot_fleet_request_history
- describe_stale_security_groups
- describe_volumes_modifications
- describe_vpc_classic_link_dns_support
- describe_vpc_endpoint_connection_notifications
- describe_vpc_endpoint_connections
- describe_vpc_endpoint_service_configurations
- describe_vpc_endpoint_service_permissions
- describe_vpc_endpoint_services
- describe_vpc_endpoints
elasticbeanstalk:
- compose_environments
- describe_application_versions
- describe_environment_managed_action_history
- describe_environments
- describe_instances_health
- list_platform_versions
events:
- list_rule_names_by_target
- list_rules
- list_targets_by_rule
fms:
- list_compliance_status
- list_policies
gamelift:
- describe_fleet_attributes
- describe_fleet_capacity
- describe_fleet_events
- describe_fleet_utilization
- describe_game_session_details
- describe_game_session_queues
- describe_game_sessions
- describe_instances
- describe_matchmaking_configurations
- describe_matchmaking_rule_sets
- describe_player_sessions
- describe_scaling_policies
- list_aliases
- list_builds
- list_fleets
- search_game_sessions
greengrass:
- list_core_definition_versions
- list_core_definitions
- list_deployments
- list_device_definition_versions
- list_device_definitions
- list_function_definition_versions
- list_function_definitions
- list_group_versions
- list_groups
- list_logger_definition_versions
- list_logger_definitions
- list_resource_definition_versions
- list_resource_definitions
- list_subscription_definition_versions
- list_subscription_definitions
kinesis:
- list_shards
kinesis-video-archived-media:
- list_fragments
kinesisvideo:
- list_streams
- list_tags_for_stream
marketplace-entitlement:
- get_entitlements
mediaconvert:
- describe_endpoints
- list_job_templates
- list_jobs
- list_presets
- list_queues
mediastore:
- list_containers
mediastore-data:
- list_items
mgh:
- list_created_artifacts
- list_discovered_resources
- list_migration_tasks
- list_progress_update_streams
mq:
- list_brokers
- list_configuration_revisions
- list_configurations
- list_users
mturk:
- list_review_policy_results_for_hit
opsworks:
- list_tags
opsworkscm:
- describe_backups
- describe_events
- describe_servers
polly:
- list_lexicons
rekognition:
- get_celebrity_recognition
- get_content_moderation
- get_face_detection
- get_face_search
- get_label_detection
- get_person_tracking
route53:
- list_query_logging_configs
- list_vpc_association_authorizations
sagemaker:
- list_notebook_instance_lifecycle_configs
secretsmanager:
- list_secret_version_ids
- list_secrets
serverlessrepo:
- list_application_versions
- list_applications
servicediscovery:
- get_instances_health_status
ses:
- list_configuration_sets
- list_receipt_rule_sets
- list_templates
shield:
- list_attacks
snowball:
- list_cluster_jobs
- list_clusters
ssm:
- describe_automation_executions
- describe_automation_step_executions
- describe_available_patches
- describe_effective_instance_associations
- describe_effective_patches_for_patch_baseline
- describe_instance_associations_status
- describe_instance_patch_states
- describe_instance_patch_states_for_patch_group
- describe_instance_patches
- describe_inventory_deletions
- describe_maintenance_window_execution_task_invocations
- describe_maintenance_window_execution_tasks
- describe_maintenance_window_executions
- describe_maintenance_window_targets
- describe_maintenance_window_tasks
- describe_maintenance_windows
- describe_patch_baselines
- describe_patch_groups
- get_inventory
- get_inventory_schema
- list_association_versions
- list_compliance_items
- list_compliance_summaries
- list_document_versions
- list_inventory_entries
- list_resource_compliance_summaries
- list_resource_data_sync
transcribe:
- list_transcription_jobs
- list_vocabularies
workmail:
- list_mailbox_permissions
- list_resource_delegates
workspaces:
- describe_ip_groups
- describe_workspaces_connection_status
joguSD commented 6 years ago

Thanks for generating this list. We need to do a little more auditing on these. Another important attribute for operations that we can support pagination for is if the operation has a list in the output (and preferably only one list). Sometimes APIs will paginate over maps or have multiple lists and this can be problematic for us as we aggregate the output on behalf of the user (this is unique to boto3/CLI in comparison to other SDKs). If anybody wants to start taking a crack at these please go ahead, ideally each service as a separate PR.

kalyanaramansanthanam commented 5 years ago

any ETA on when these paginators will be available?

slai commented 5 years ago

So that there's one place to look for all of these, here's a list of other PRs I've found that add pagination definitions -

kapilt commented 5 years ago

I’ve done one off contribs on paginators before. I suggest a holistic approach geared towards coverage, ie default generators based on this heuristic, followed with api method specific customization as needed.

kapilt commented 5 years ago

I went ahead and extended this snippet to enable generation of paginators taking into the account additional constraints with regard to merged output fields, if I send a pr per service its on the order of 60+.. is that preferable to an individual pr?

brandond commented 5 years ago

I just updated #1548 to fix a test failure. I agree that handling this at an individual service level seems prone to failure. It'd be great to have the paginator config auto-generated from the service JSON instead of being manually updated (or not, as the case may be).

JordonPhillips commented 5 years ago

@kapilt I have a script that generates paginators, constrained heavily to only generate those that I can be absolutely certain about. I just merged a PR stemming from that that added a ton. What we really need to do is to have that be part of our release automation. Right now we're relying pretty much entirely on getting them from upstream.

kapilt commented 5 years ago

@JordonPhillips thanks, one thing missing from the pr was the script used to generate afaics re https://github.com/boto/botocore/pull/1633 which ideally would be part of the source tree as well (scripts dir perhaps)

kristianperkins commented 5 years ago

The script doesn't identify methods where the token is not named NextToken as can happen. For example cognito-idp uses the parameter NextToken for list_users_in_group, but uses PaginationToken for list_users. I'm not sure how many other services use different naming conventions for the name of the token used for pagination.

JonathanDCohen commented 5 years ago

Looks like this is still an issue:

client = boto3.client('cloudwatch', region_name='us-east-1')
paginator = client.get_paginator('get_metric_data')
Traceback (most recent call last):
...
    paginator = client.get_paginator('get_metric_data')
  File ".../env/lib/python3.6/site-packages/botocore/client.py", line 387, in get_paginator
    if not self.can_paginate(operation_name):
  File "/usr/local/google/home/cohenjon/Source/outline-electron-metrics/env/lib/python3.6/site-packages/botocore/client.py", line 420, in can_paginate
    actual_operation_name = self._PY_TO_OP_NAME[operation_name]
KeyError: 'get_metric_data'
brandond commented 5 years ago

@JonathanDCohen I suspect you're on an out-of-date version of boto3. Have you tried updating? I'm on a version from March and it works fine.

>>> import boto3
>>> client = boto3.client('cloudwatch')
>>> client.get_paginator('get_metric_data')
<botocore.client.CloudWatch.Paginator.GetMetricData object at 0x7f7e9e3bdfd0>
>>> boto3.__version__
'1.9.122'
JonathanDCohen commented 5 years ago

@brandond yeah that was it. All figured out :)

dacut commented 4 years ago

The kinesis.list_shards paginator has issues; the API interface is problematic in that it wants StreamName to be set on the first call, but not set when NextToken is passed. (ugh.)

Python 3.7.5 (default, Nov  1 2019, 02:16:23)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import boto3.session

In [2]: boto3.__version__
Out[2]: '1.10.43'

In [3]: kinesis = boto3.session.Session(profile_name="xxxxxxxx").client("kinesis")

In [4]: paginator = kinesis.get_paginator("list_shards").paginate(StreamName="my-stream-name", PaginationConfig={"PageSize": 2})

In [5]: for page in paginator:
   ...:     print(page)
   ...:
{'Shards': [...], 'NextToken': '...', 'ResponseMetadata': {...}
---------------------------------------------------------------------------
InvalidArgumentException                  Traceback (most recent call last)
<ipython-input-5-d3d91b85e827> in <module>
----> 1 for page in paginator:
      2     print(page)
      3

~/projects/mars-ntr-load-test/venv/lib/python3.7/site-packages/botocore/paginate.py in __iter__(self)
    253         self._inject_starting_params(current_kwargs)
    254         while True:
--> 255             response = self._make_request(current_kwargs)
    256             parsed = self._extract_parsed_response(response)
    257             if first_request:

~/projects/mars-ntr-load-test/venv/lib/python3.7/site-packages/botocore/paginate.py in _make_request(self, current_kwargs)
    330
    331     def _make_request(self, current_kwargs):
--> 332         return self._method(**current_kwargs)
    333
    334     def _extract_parsed_response(self, response):

~/projects/mars-ntr-load-test/venv/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    274                     "%s() only accepts keyword arguments." % py_operation_name)
    275             # The "self" in this scope is referring to the BaseClient.
--> 276             return self._make_api_call(operation_name, kwargs)
    277
    278         _api_call.__name__ = str(py_operation_name)

~/projects/mars-ntr-load-test/venv/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    584             error_code = parsed_response.get("Error", {}).get("Code")
    585             error_class = self.exceptions.from_code(error_code)
--> 586             raise error_class(parsed_response, operation_name)
    587         else:
    588             return parsed_response
PatMyron commented 3 years ago

The script doesn't identify methods where the token is not named NextToken as can happen. For example cognito-idp uses the parameter NextToken for list_users_in_group, but uses PaginationToken for list_users. I'm not sure how many other services use different naming conventions for the name of the token used for pagination.

@iann0036 investigated: https://github.com/iann0036/aws-pagination-rules

Chart: Distribution of AWS service count by pagination method

I also investigated: https://github.com/aws-cloudformation/cloudformation-cli/pull/663


unmerged PRs: https://github.com/boto/botocore/pull/1470, https://github.com/boto/botocore/pull/1847, https://github.com/boto/botocore/pull/2004, https://github.com/boto/botocore/pull/2018, https://github.com/boto/botocore/pull/2104, https://github.com/boto/botocore/pull/2177

khneal commented 3 years ago

I ran into a missing paginator for a function that uses "nextToken" instead of "NextToken"... it seems like this should have been fixed by now.

I have a script that generates paginators, constrained heavily to only generate those that I can be absolutely certain about. I just merged a PR stemming from that that added a ton. What we really need to do is to have that be part of our release automation. Right now we're relying pretty much entirely on getting them from upstream.

@JordonPhillips - would you be willing to share your script so others can use it?

CONTRIBUTING.rst mentions this topic first added here:

We may choose not to accept pull requests that change the JSON service descriptions... We generate these files upstream based on our internal knowledge of the AWS services. If there is something incorrect with or missing from these files, it may be more appropriate to submit an issue so we can get the issue fixed upstream.

I see one-offs being added to the code, but there are also several PRs sitting open (see above). Are one-offs now allowed?

@jamesls - since you updated CONTRIBUTING.rst, where is the upstream location that we can modify to get this fixed across SDKs for all languages?

tim-finnigan commented 1 year ago

Service teams are now the owners of their paginator models as those model definitions are shared across AWS SDKs. We are currently tracking paginator requests here in our cross-SDK respository: https://github.com/aws/aws-sdk/issues?q=is%3Aissue+is%3Aopen+label%3Apaginator. If you'd like to see a paginator for a specific service/API, please create an issue in that repository with your use case and request.

You can also consider reaching out through AWS Support for further escalation on these types of requests. But since the paginator additions would need to happen upstream rather than in botocore directly, I'm going to close this issue. Please let us know if you had any questions or feedback regarding this.