airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.47k stars 3.99k forks source link

Source Gitlab: enable using self-signed SSL Certificates #14491

Open turalmirza opened 2 years ago

turalmirza commented 2 years ago

'Unable to connect to Gitlab API with the provided credentials - SSLError(MaxRetryError("HTTPSConnectionPool(host=\'gitlab.blabla.lan\', port=443): Max retries exceeded with url: /api/v4/groups/demo?per_page=50 (Caused by SSLError(SSLCertVerificationError(1, \'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)\')))"))'

this error occuer when i want to connect faros ai(using airbyte sources) to gitlab . Our gitlab hase a slfsigned certificate. therefor airbyte does not trust it. And i cant find how can i skip ssl verification or add our root CA cert to gitlab source container.

sOfekS commented 2 years ago

I'm also experiencing this issue, was there any progress made on this?

esmith02 commented 1 year ago

UPDATE: I was trying a different source and getting the same answer... until I realized.. I Bet my corporate VPN is blocking something. I turned it off, and no longer got this error.


So for giggles I connected the PokeAPI to Local CSV just to see a use of the system. I got the same error. So this is easy to reproduce without needing to use gitlab in any way. Specifically: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='pokeapi.co', port=443): Max retries exceeded with url: /api/v2/pokemon/ditto?pokemon_name=ditto (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))) from the error logging.

I'm going to try something more "real" but this seemed like a simple test to "play" with Airbyte and... I guess it wasn't? Not a good look.

sherifnada commented 1 year ago

a potential cause here is that the certs are self-signed or signed by a CA not known to python's requests library. In that case we might need to append a CA Bundle like described here.

We'll need to allow users to upload their own root CAs or PEM bundles. Please upvote this issue with a :+1: emoji reaction to help us understand priority.

lephuongbac commented 1 year ago

same issue when I connect to on-premise gitlab via http request, how can I switch to connect via http instead of https?

davydov-d commented 1 year ago

hey @YowanR I wonder if we should treat this issue as an enhancement or a bug? cc @misteryeo

misteryeo commented 1 year ago

@davydov-d This would be an enhancement but it looks like we have a good amount of upvotes on this issue so let's plan to implement this as part of our existing work.

davydov-d commented 1 year ago

@misteryeo thanks. I believe this can only be done for OSS, not for cloud (this may be a security issue). But I'm not sure there is a way to distinguish whether the connector container is run on cloud or not. Could someone assist me with that?

misteryeo commented 1 year ago

@lazebnyi @YowanR ☝️

sherifnada commented 1 year ago

Airbyte currently doesn't have a way to upload files (e.g SSL certs) to be used as input to Connectors. The only available option is "long" string fields, which I'm not sure if it can be used to upload SSL certs.

If it can't be, then the options I can think of are either:

  1. enable file uploads in the Airbyte product. This is not currently scoped and will probably be a 2-4 week effort.
  2. use a custom connector with the certs pre-loaded

Maybe there is something else we can do in the python code to bypass cert validation. i'm not sure.

Given these options I would recommend that anyone in immediate need of this feature use a custom conector

davydov-d commented 1 year ago

@YowanR regarding Sherif's comment, I believe we are not going to implement this ticket as a part of certification. Am I right?

YowanR commented 1 year ago

@davydov-d yes, that's correct. If anyone wants us to support this issue, please keep upvoting it and we'll re-evaluate in the upcoming quarters.

scottboston commented 1 year ago

Is there a way to use verify = False for https sources?

sherifnada commented 1 year ago

do you have a suggestion for how it should be implemented? happy to review a proposal or PR

scottboston commented 1 year ago

I think we can have Airbyte use environment variables REQUEST_CA_BUNDLE. We'll need to "merge_environment_settings".

avnav0 commented 1 year ago

having the same issue, with salesforce. i have airbyte running in docker on an ec2 behind vpn. when i make my own connection it works. when i try the salesforce connector i get:

airbyte-server                    | requests.exceptions.SSLError: HTTPSConnectionPool(host='login.salesforce.com', port=443): Max retries exceeded with url: /services/oauth2/token (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
airbyte-server                    | ,retryable=<null>,timestamp=1685468762963,additionalProperties={}], metadata: {workspace_id=5b3a0fd8-24b6-4a6b-bf57-d74a2459933e, airbyte_version=0.44.5, connector_definition_id=b117307c-14b6-41aa-9422-947e34922962, failure_origin=source, connector_release_stage=generally_available, connector_repository=airbyte/source-salesforce, job_id=5bbba7e4-d7f9-41f2-8a4b-242ee655bac5, workspace_url=http://localhost:8000/workspaces/5b3a0fd8-24b6-4a6b-bf57-d74a2459933e, failure_type=system_error, connector_command=check, connector_name=Salesforce, deployment_mode=OSS}
jeffsdata commented 5 months ago

I'm running Airbyte inside Kubernetes inside of a corporate network, which has its own self-signed cert for decrypting traffic going through the firewall. This causes an error when I try to use the Marketo connector, the test fails with error: Exception: Error while refreshing access token: HTTPSConnectionPool(host='xxxxxxxx.mktorest.com', port=443): Max retries exceeded with url: /identity/oauth/token?grant_type=client_credentials&client_id=****&client_secret=**** (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1129)'))

When I run anything in Python, I need to manually append the self-signed cert(s) to requests package's cacert.pem file. Note that the Airbyte Google Analytics 4 and Google Search Console connectors work fine (at least... the test works fine). I usually use that "patched" cacert.pem to point other apps at - like curl, node.js, postman, etc... using environment variables.