elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
73 stars 125 forks source link

Allow proxy between Ent Search Connectors and Elasticsearch #2017

Open gbocchini opened 9 months ago

gbocchini commented 9 months ago

Problem Description

At the present moment, the Elastic connectors (https://github.com/elastic/connectors/blob/8.11/config.yml and https://www.elastic.co/guide/en/enterprise-search/current/connectors.html) do not have an option to declare a proxy for the connection.

Proposed Solution

The change would allow connector to communicate with Elasticsearch via a proxy connection.

Additional Context

In many use cases, due security or just architectural design, all connection must go through a proxy. In the actual scenario, this is impossible for a Ent Search connector limiting it's use cases and usability.

MatheusGelinskiPires commented 9 months ago

@gbocchini , hello!

Maybe we will need to have a configuration for the extraction as well.

For exemple: after having a Connector connected to Elasticsearch via proxy, we need to configure a extractor (Salesforce in my case) and that connection from Connector to Salesforce also needs to be via proxy connection.

Thx!

seanstory commented 9 months ago

@MatheusGelinskiPires please go ahead and file an enhancement issue for your use case for a salesforce proxy. Unfortunately, each connector typically uses its own transport client implementation, so we'd need to build that type of proxy configuration on a case-by-case basis for each connector. So I'd prefer to not lump that in with this issue.

seanstory commented 9 months ago

@gbocchini I'd like to better understand why this feature is necessary. The three main ways we envision connectors being used are:

  1. Native Connectors with an Elastic Cloud Elasticsearch (no proxy needed)
  2. Connector Clients with an Elastic Cloud Elasticsearch (no proxy needed)
  3. Connector Clients with Self Managed Elasticsearch (typically located in the same VPC, so typically no proxy needed)

What was the situation where this need came up?

gbocchini commented 9 months ago

Hello @seanstory! Nice to e-meet! This came from a customer using SalesForce connector. The proxy option would be between the connector and Elastic search. In their scenario (they are a telecom) they are using SalesForce connector and everything must go via Proxy (SF is not on their infra).

Not having the proxy option between the connector and Elastisearch causes the connection to be made outside their proxy, deeming it out of compliance.

Support case 01533799 in case it interests you :)

Thanks!

seanstory commented 9 months ago

Thanks for the case number, @gbocchini.

@MatheusGelinskiPires , I understand you're the impacted customer! Can you share any more about your use case, and why it doesn't match one of the 3 situations I described above? Is your environment air gapped or something such that it cannot make requests to the outside internet unless through a proxy?

MatheusGelinskiPires commented 9 months ago

Hello @seanstory, I need to extract some information from Salesforce which I will use to setup a few Alert Rules. In this case, if I understood the documentation correctly, I need to use a Client Connector (self managed) for Salesforce. This Connector will be deployed in our on premises infastructure. In order to comply with some security rules everything that is deployed on our on premises infastructure and needs to reach internet should do this via proxy.

MatheusGelinskiPires commented 9 months ago

Hello @seanstory , complementing the information above, as our Elastic environment is a Cloud environment, even the Elasticsearch connection should be via proxy as well.

That could leads to this issue also: https://github.com/elastic/elasticsearch-py/issues/2217

artem-shelkovnikov commented 5 months ago

More info about other case that I've heard about:

There's a setup in a private virtual network that uses a proxy to reach out to the outside world or other private networks. So connector is unable to reach to the 3rd-party and needs to do interactions over an HTTP proxy with SSL certificate used for the authentication):

[                     Private VNetwork                       ]         [   Internet   ]
[Elasticsearch] <<< >>> [SPO Connector] <<< >>> [ HTTP Proxy ] <<< >>> [ 3rd-party ]

Requirements are to be able to connect to proxy anonymously (for testing) OR with basic auth (for testing too) OR with a certificate.

Here's a POC PR that shows how much effort is needed to implement such logic for Sharepoint Online connector: https://github.com/elastic/connectors/pull/2266/files. This PR, however, does not have SSL Certificate support as I had no time to set up a proxy with SSL Certificate. It would take around an hour to add and test SSL Certificate support when a proxy is set up and available.

MatheusGelinskiPires commented 5 months ago

There is another case where the Elasticsearch could be running on a Elastic Cloud deployment.

So, the proxy should be also used in this connection as well.