databricks / databricks-sql-nodejs

Databricks SQL Connector for Node.js
Apache License 2.0
24 stars 34 forks source link

Issue with Databricks Making Requests to Unavailable URL Through Squid Proxy #237

Closed tientnvn closed 7 months ago

tientnvn commented 8 months ago

Environment Details:

Issue Description:

I am encountering an issue when using Databricks with Squid HTTP proxy. The code is making requests to an unexpected URL, which is not available.

Code Example:

module.exports.databricksConfig = {
    authType: "databricks-oauth",
    host: process.env.DATABRICKS_HOST || "",
    path: process.env.DATABRICKS_PATH || "",
    port: process.env.DATABRICKS_PORT || "",
    oauthClientId: process.env.DATABRICKS_CLIENT_ID || "",
    oauthClientSecret: process.env.DATABRICKS_CLIENT_SECRET || "",
    useDatabricksOAuthInAzure: true,
    https: process.env.DATABRICKS_HTTPS || true,
    proxy: {
        protocol: process.env.DATABRICKS_HTTP_PROXY_PROTOCOL,
        host: process.env.DATABRICKS_HTTP_PROXY_HOST,
        port: process.env.DATABRICKS_HTTP_PROXY_PORT
    }
}

Received Logs in Squid Proxy

image

kravets-levko commented 8 months ago

Hi @tientnvn! Library tries to access those URLs to obtain an OAuth configuration for your workspace. That URLs should be available. Can you please try to access those URLs directly (not through proxy) - to make sure if they indeed exist. If they are acessible w/o proxy - check your proxy configuration. If they're not available - please contact Databricks support

tientnvn commented 8 months ago

Thank @kravets-levko for your message. I've verified that I can indeed connect to Databricks successfully without the need for a proxy. However, it's important to mention that our network policies mandate that connections to Databricks must be routed through a proxy. Upon attempting to connect via the proxy, I encountered the error mentioned above.

kravets-levko commented 8 months ago

@tientnvn If you can run the same script without proxy, but it fails if using proxy - most likely it's the issue with your proxy configuration. Start in that direction, and when you're sertainly sure that proxy itself isn't a problem - please let me know

tientnvn commented 8 months ago

Hi @kravets-levko I tried with curl, the URL can work with my Databricks URL: image

I wonder why the client requests HTTP instead of HTTPS.

kravets-levko commented 8 months ago

Hi @tientnvn! This definitely looks weird. Both the fact that client uses http for oauth (which shouldn't happen), and that requests fail with proxy (I think both may be related to each other). Can you share your Squid config and Squid version you use so I can dig deeper?

tientnvn commented 8 months ago

Hi @kravets-levko ,

I am using Ubuntu squid version 4.10 with default configuration.

Squid Cache: Version 4.10
Service Name: squid
Ubuntu linux
configure options:  '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=${prefix}/lib/squid' '--srcdir=.' '--disable-maintainer-mode' '--disable-dependency-tracking' '--disable-silent-rules' 'BUILDCXXFLAGS=-g -O2 -fdebug-prefix-map=/build/squid-z5nTiP/squid-4.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed' 'BUILDCXX=x86_64-linux-gnu-g++' '--with-build-environment=default' '--enable-build-info=Ubuntu linux' '--datadir=/usr/share/squid' '--sysconfdir=/etc/squid' '--libexecdir=/usr/lib/squid' '--mandir=/usr/share/man' '--enable-inline' '--disable-arch-native' '--enable-async-io=8' '--enable-storeio=ufs,aufs,diskd,rock' '--enable-removal-policies=lru,heap' '--enable-delay-pools' '--enable-cache-digests' '--enable-icap-client' '--enable-follow-x-forwarded-for' '--enable-auth-basic=DB,fake,getpwnam,LDAP,NCSA,NIS,PAM,POP3,RADIUS,SASL,SMB' '--enable-auth-digest=file,LDAP' '--enable-auth-negotiate=kerberos,wrapper' '--enable-auth-ntlm=fake,SMB_LM' '--enable-external-acl-helpers=file_userip,kerberos_ldap_group,LDAP_group,session,SQL_session,time_quota,unix_group,wbinfo_group' '--enable-security-cert-validators=fake' '--enable-storeid-rewrite-helpers=file' '--enable-url-rewrite-helpers=fake' '--enable-eui' '--enable-esi' '--enable-icmp' '--enable-zph-qos' '--enable-ecap' '--disable-translation' '--with-swapdir=/var/spool/squid' '--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid' '--with-filedescriptors=65536' '--with-large-files' '--with-default-user=proxy' '--with-gnutls' '--enable-linux-netfilter' 'build_alias=x86_64-linux-gnu' 'CC=x86_64-linux-gnu-gcc' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/squid-z5nTiP/squid-4.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wall' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXX=x86_64-linux-gnu-g++' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/squid-z5nTiP/squid-4.10=. -fstack-protector-strong -Wformat -Werror=format-security'
tientnvn commented 8 months ago

Hi @kravets-levko, Do you have any updates

tientnvn commented 7 months ago

Hi @kravets-levko , I see same issue in python client. They fixed already. I also check and it can work well. https://github.com/databricks/databricks-sql-python/issues/22

kravets-levko commented 7 months ago

@tientnvn Sorry, I was sick for a couple of days. I tried to reproduce your issue, but still no luck. So I still ask you to share:

  1. your squid config
  2. the way you run squid (full command line if not in daemon mode)
  3. a code to reproduce issue. The code you posted in issue description is not enough, because all the important things are hidden behind envs. You can omit sensitive envs (your workspace host & path, OAuth client ID and secret), but please share the rest
tientnvn commented 7 months ago

Hi @kravets-levko , I shared my code here: https://github.com/tientnvn/databricks-nodejs-demo I run the same way with python and golang, it can works well.

kravets-levko commented 7 months ago

@tientnvn, thank you for creating a demo project, it really helped to debug your issue (it still took some time, though). But I can confirm the bug - turns out that when using http proxy, library unintentionally rewrites all the URLs from https: to http:, and proxy fails because Databricks server actually doesn't allow http urls. I started working on fix, will ping you once it's available. Sorry for the inconvenience

tientnvn commented 7 months ago

Thank @kravets-levko so much for the update and digging into this! It's great you were able to pinpoint the cause. Please keep me posted on the fix, and let me know if there's anything I can do to assist.

kravets-levko commented 7 months ago

Hi @tientnvn! Can you please try v1.8.4 and check if your issue is fixed there? If not - please let me know and re-open this issue. Thank you!