elastic / connectors

Official Elastic connectors for third-party data sources
https://www.elastic.co/guide/en/elasticsearch/reference/master/es-connectors.html
Other
16 stars 132 forks source link

GitHub connector does too many requests while not doing syncs at all #2880

Open artem-shelkovnikov opened 1 month ago

artem-shelkovnikov commented 1 month ago

Bug Description

Every now and then (30 seconds by default) the framework asks the connector if the setup of the connector is valid. For GitHub connector it causes validation of names for the repositories which for elastic organisation for me does 23 API calls. This is 2640 requests per hour.

GitHub throttling limits for GraphQL are 5000 requests per hour. So idle connector eats >50% of github throttling limits per hour. We need to fix this and make the connector not validate so aggressively. It looks like we cannot validate repository names without affecting throttling.

To Reproduce

Steps to reproduce the behavior:

  1. Create a github connector and point it to a large enough org (elastic works for internal users)
  2. Leave the connector running with service.log_level: DEBUG
  3. Count number of calls done per hour
  4. It's gonna be thousands of requests

Expected behavior

Very small number of requests is done per hour. (ideally 1 per heartbeat, resulting in 120 requests per hour)

sorenlouv commented 18 hours ago

What are the reasons for validating periodically, instead of only validating when the configuration changes?

Validating upon change would drastically reduce the number of requests to Github, and could provide a better UX since users would get near-immediate feedback on their config changes