MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
8 stars 25 forks source link

Add support for self-signed ssl certs on selfhosted Gitlab instances #50

Closed pnadolny13 closed 2 years ago

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 14, 2021, 17:40

Hey guys,

I'm just playing around with Meltano and our self-hosted internal Gitlab instance and I am getting errors like this:

meltano-ui_1  | INFO Starting sync
meltano-ui_1  | INFO Skipping stream: merge_request_commits
meltano-ui_1  | INFO Skipping stream: epics
meltano-ui_1  | INFO Skipping stream: epic_issues
meltano-ui_1  | INFO Skipping stream: pipelines_extended
meltano-ui_1  | INFO GET https://internal.git.lan/api/v4/groups/supergroup
meltano-ui_1  | INFO Backing off request(...) for 1.6s (requests.exceptions.SSLError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)))
meltano-ui_1  | INFO GET https://internal.git.lan/api/v4/groups/supergroup
meltano-ui_1  | INFO Backing off request(...) for 1.1s (requests.exceptions.SSLError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)))
meltano-ui_1  | INFO GET https://internal.git.lan/api/v4/groups/supergroup
meltano-ui_1  | INFO Backing off request(...) for 5.6s (requests.exceptions.SSLError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)))
meltano-ui_1  | INFO GET https://internal.git.lan/api/v4/groups/supergroup
meltano-ui_1  | INFO Backing off request(...) for 15.2s (requests.exceptions.SSLError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)))
meltano-ui_1  | INFO GET https://internal.git.lan/api/v4/groups/supergroup
meltano-ui_1  | ERROR Giving up request(...) after 5 tries (requests.exceptions.SSLError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)))
meltano-ui_1  | CRITICAL HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))
meltano-ui_1  | Traceback (most recent call last):
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
meltano-ui_1  |     chunked=chunked)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
meltano-ui_1  |     self._validate_conn(conn)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
meltano-ui_1  |     conn.connect()
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/connection.py", line 344, in connect
meltano-ui_1  |     ssl_context=context)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 345, in ssl_wrap_socket
meltano-ui_1  |     return context.wrap_socket(sock, server_hostname=server_hostname)
meltano-ui_1  |   File "/usr/local/lib/python3.6/ssl.py", line 407, in wrap_socket
meltano-ui_1  |     _context=self, _session=session)
meltano-ui_1  |   File "/usr/local/lib/python3.6/ssl.py", line 817, in __init__
meltano-ui_1  |     self.do_handshake()
meltano-ui_1  |   File "/usr/local/lib/python3.6/ssl.py", line 1077, in do_handshake
meltano-ui_1  |     self._sslobj.do_handshake()
meltano-ui_1  |   File "/usr/local/lib/python3.6/ssl.py", line 689, in do_handshake
meltano-ui_1  |     self._sslobj.do_handshake()
meltano-ui_1  | ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)
meltano-ui_1  | 
meltano-ui_1  | During handling of the above exception, another exception occurred:
meltano-ui_1  | 
meltano-ui_1  | Traceback (most recent call last):
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
meltano-ui_1  |     timeout=timeout
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
meltano-ui_1  |     _stacktrace=sys.exc_info()[2])
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment
meltano-ui_1  |     raise MaxRetryError(_pool, url, error or ResponseError(cause))
meltano-ui_1  | urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))
meltano-ui_1  | 
meltano-ui_1  | During handling of the above exception, another exception occurred:
meltano-ui_1  | 
meltano-ui_1  | Traceback (most recent call last):
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/bin/tap-gitlab", line 8, in <module>
meltano-ui_1  |     sys.exit(main())
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 860, in main
meltano-ui_1  |     raise exc
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 857, in main
meltano-ui_1  |     main_impl()
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 852, in main_impl
meltano-ui_1  |     do_sync()
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 807, in do_sync
meltano-ui_1  |     sync_group(gid, pids)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 605, in sync_group
meltano-ui_1  |     data = request(url).json()
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/backoff/_sync.py", line 94, in retry
meltano-ui_1  |     ret = target(*args, **kwargs)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 229, in request
meltano-ui_1  |     resp = SESSION.send(req)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/requests/sessions.py", line 637, in send
meltano-ui_1  |     r = adapter.send(request, **kwargs)
meltano-ui_1  |   File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/requests/adapters.py", line 514, in send
meltano-ui_1  |     raise SSLError(e, request=request)
meltano-ui_1  | requests.exceptions.SSLError: HTTPSConnectionPool(host='internal.git.lan', port=443): Max retries exceeded with url: /api/v4/groups/supergroup (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))

I think this is a pretty standard one, but I don't find a clean way to fix this. What works is to install our cert in the docker container with:

RUN apt-get update && apt-get install -y ca-certificates
RUN update-ca-certificates --fresh

ENV TAP_GITLAB_API_URL https://internal.git.lan

And than add a line in .meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py (here):

LOGGER = singer.get_logger()
SESSION = requests.Session()
# This works
SESSION.verify = "/etc/ssl/certs/ca-certificates.crt"

... I guess SESSION.verify = False would also work.

I am not really a python guy, but from what I read setting the envars REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt or CURL_CA_BUNDLE=/usr/local/share/ca-certificates/ca.crt should also work, but they do not.

Is the a clean way to do this without modifying code inside the .meltano folder?

regards and thanks!

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 16, 2021, 13:41

I am not really a python guy, but from what I read setting the envars REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt or CURL_CA_BUNDLE=/usr/local/share/ca-certificates/ca.crt should also work, but they do not.

@toxsick That's odd, it seems like that should work from the docs and code.

We can consider adding a new config.json setting like ssl_verify, which could take a path or a boolean, but I'd like to start by debugging the env var solution.

Can you show me how you're setting REQUESTS_CA_BUNDLE? And can you run meltano elt in debug mode (https://meltano.com/docs/command-line-interface.html#debugging) with meltano --log-level elt ... so that we get to see the full environment tap-gitlab is invoked with? We should be able to see there if REQUESTS_CA_BUNDLE makes it through correctly or not.

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 16, 2021, 13:41

assigned to @DouweM

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 16, 2021, 17:05

Hey @DouweM ,

I am running this thing in a docker container. Here is what happens when I run meltano --log-level=debug elt tap-gitlab target-postgres --job_id=gitlab-to-postgres in the conainer (Sorry it's a lot):

debug.log.zip

The debug shows that 'REQUESTS_CA_BUNDLE': '/etc/ssl/certs/ca-certificates.crt' is present.

If I comment in SESSION.verify = "/etc/ssl/certs/ca-certificates.crt" in as described above it works fine.

Thanks for looking into this!

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 16, 2021, 18:45

@toxsick Can you please share your complete Dockerfile?

Are you confident that /etc/ssl/certs/ca-certificates.crt is present inside the Docker container when you're taking the env var approach? Otherwise I have no idea why requests would be ignoring it :/

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 17, 2021, 05:07

@DouweM sure, here you go:

ARG MELTANO_IMAGE=meltano/meltano:latest
FROM $MELTANO_IMAGE

WORKDIR /project

# Install any additional requirements
COPY ./requirements.txt . 
RUN pip install -r requirements.txt

# Install all plugins into the `.meltano` directory
COPY ./meltano.yml . 
RUN meltano install

# Pin `discovery.yml` manifest by copying cached version to project root
RUN cp -n .meltano/cache/discovery.yml . 2>/dev/null || :

# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1

# Copy over remaining project files
COPY . .

# Expose default port used by `meltano ui`
EXPOSE 5000

# Install self-signed cert
COPY misc/our_ca.crt /usr/local/share/ca-certificates/ca.crt
RUN apt-get update && apt-get install -y ca-certificates \
  && update-ca-certificates --fresh

ENV TAP_GITLAB_API_URL https://git.internal.lan
ENV REQUESTS_CA_BUNDLE /etc/ssl/certs/ca-certificates.crt

ENTRYPOINT ["meltano"]

This is also what I don't understand. I also think that requests should pick it up. But since it works when I do SESSION.verify = "/etc/ssl/certs/ca-certificates.crt" here I am pretty confident that ca-certificates.crt includes my self-signed cert.

Could it be that a requests.Session() does not use the REQUESTS_CA_BUNDLE envar? I think the docs are a little fuzzy on that...

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 17, 2021, 16:53

But since it works when I do SESSION.verify = "/etc/ssl/certs/ca-certificates.crt" here I am pretty confident that ca-certificates.crt includes my self-signed cert.

@toxsick Makes sense. When you are editing that file, how are you making sure that change makes it into the Docker image? I'd expect that meltano install would just reinstall the plugin from the pip_url specified in your meltano.yml, so .meltano would not contain any changes you made.

Could it be that a requests.Session() does not use the REQUESTS_CA_BUNDLE envar? I think the docs are a little fuzzy on that...

The env var logic (https://github.com/psf/requests/blob/8c211a96cdbe9fe320d63d9e1ae15c5c07e179f8/requests/sessions.py#L718) is implemented on the Session class, so we should be good. Your ENV directive looks good as well, and as we verified with meltano --log-level=debug, that value is actually making it into the tap's execution environment :/

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 17, 2021, 17:02

@toxsick I think I've figured it out: The merge_environment_settings method that reads the environment is called from Session.request, but not Session.send which is used by the tap.

So I'm thinking the solution is to rewrite https://gitlab.com/meltano/tap-gitlab/-/blob/master/tap_gitlab/__init__.py#L226-228 to use Session.request instead of Session.send, so that the env var is respected:

    resp = SESSION.request('GET', url, params=params, headers=headers)
    LOGGER.info("GET {}".format(url))

The calls below to req.url would also need to be changed to just url.

Can you try making that change locally and see if it has the desired effect? If so, I'd appreciate a merge request to fix this issue!

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 18, 2021, 04:42

@DouweM sounds promising. Thanks for digging into this! I will give it a try and create a PR tomorrow.

Have a good day

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 19, 2021, 09:15

mentioned in commit toxsick/tap-gitlab@5d721553c75a65b662464c3ac7462f34163dbd7a

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 19, 2021, 09:19

mentioned in merge request !38

pnadolny13 commented 2 years ago

In GitLab by @toxsick on Mar 19, 2021, 09:21

@DouweM That worked perfectly, my MR !38 is really just that. Thanks für investigating this so fast.

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 22, 2021, 11:20

assigned to @toxsick

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Mar 22, 2021, 11:30

mentioned in commit bf5be2bd22b849bdc56a41f258111314a6bfee73