Widen / tap-rest-api-msdk

`tap-rest-api-msdk` is a Singer tap for generic rest-apis, built with the Meltano SDK for Singer Taps.
Apache License 2.0
24 stars 25 forks source link

SSL verification #54

Open adam-80 opened 2 months ago

adam-80 commented 2 months ago

I am experiencing SSL errors when I am attempting to use the tap to call internal APIs. The certificates are signed by an internal CA, the bundle is deployed to the machine hosting the tap, using Meltano.

I have set the relevant environment variables to point Python to the certificate bundle.

The tap is able to hit the endpoint and infer the schema with the appropriate fields being collected. So that is successful. The issue is experienced once the schema has been inferred and the next page token has been found. The tap loops through the backoff process with the error:

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)

It seems that it can get the endpoint and the fields, then errors on the sync.

It appears that the tap uses a different method to infer the schema from actually querying the data, and that is not respecting the CA bundle variable.

I am not sure how to proceed, I hope you can offer some advice.

jlloyd-widen commented 2 months ago

I can confirm that the methods used to infer the schema is different from those used to sync. Pagination might not run at all on schema inference. This would be hard for me to develop a fix for because I have nothing test against without being able to hit your internal APIs. My best suggestion is to attempt to fix it yourself and submit a PR. In case it's something managed by the singer-sdk and not this tap, I have just released a new version you might be able to update to and see if the issue persists. Sorry I can't give more insight 😢

adam-80 commented 2 months ago

Thanks for the response!

Unfortunately, I am no developer, so my ability to create a PR is very limited. I've been digging around the code, but my limited ability is making this fairly useless.

Testing wouldn't be difficult, you would just need an endpoint that is secured by a self-signed certificate and have the CA bundle available. A possible solution would be to include an environment variable for users to be able to point to a specific CA bundle, which is passed to whatever python libraries are being called, or perhaps an option to disable SSL verification.

I'll update to the new version and report back.