OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
126 stars 156 forks source link

OHDSI WebAPI JDBC Driver Connectivity Issues With DATABRICKS #2364

Closed cyongg closed 6 days ago

cyongg commented 2 months ago

Goal : Trying to connect OHDSI Atlas with OMOP CDM in Databricks with Tomcat.

Caused by: org.flywaydb.core.internal.dbsupport.FlywaySqlException: Unable to obtain Jdbc connection from DataSource

SQL State : 08S01 Error Code : 500593 Message : [Databricks]JDBCDriver Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown.

Process to recreate problem

Step 1 this is the SETTING.XML

image image

Step 2 We have included the databricks dependency in the POM.XML

image

Step 3 We have copied and placed DatabricksJDBC42.jar file in the classpath. (C:\Program Files\Apache Software Foundation\Tomcat 8.5\webapps\WebAPI\WEB-INF\lib)

JDBC url (From Databricks cluster) >

jdbc:databricks://{{workspace id}:{workspace port}/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5269192184046240/0404-024850-qo6kj1mb;AuthMech=3;UseNativeQuery=1;ConnCatalog={default catalog};UID={UID};PWD={token}

chrisknoll commented 2 months ago

Databricks is not a supported DB for hosting the WebAPI database. Only postgres is supported.

cyongg commented 2 months ago

Databricks is not a supported DB for hosting the WebAPI database. Only postgres is supported.

Im not hosting webapi database on Databricks but Im hosting my OMOP CDM on databricks, our WebAPI database is on postgres so what should I do to be able to make the WebAPI work?

anthonysena commented 2 months ago

To enable databricks, you can compile the .war with the Databricks per: https://github.com/OHDSI/WebAPI/issues/2262#issuecomment-1917673045

chrisknoll commented 2 months ago

You modified the postgres-sql profile to change the datasource URL to be databricks, which tells webAPI where to find the WebAPI database. You should leave this alone and follow the setup instructions to just specify the url to your own Postgres instance in your environment.

To point to a databricks CDM, you follow the instructions for setting up a source, using the dialect of 'databricks' and the JDBC url for your databricks server. As an admin user, you can use the configuration nav to add and remove sources.

chanjunkai11 commented 2 months ago

You modified the postgres-sql profile to change the datasource URL to be databricks, which tells webAPI where to find the WebAPI database. You should leave this alone and follow the setup instructions to just specify the url to your own Postgres instance in your environment.

To point to a databricks CDM, you follow the instructions for setting up a source, using the dialect of 'databricks' and the JDBC url for your databricks server. As an admin user, you can use the configuration nav to add and remove sources.

I thought the datasource.url suppose to be the jdbc url of databricks as datasource isn't it suppose to refer to where the CDM is located?

chrisknoll commented 2 months ago

No. CDMs are referenced in the Source table (please refer to indicated documentation above). 1 WebAPI can reference multiple CDMs which you can add/remove over time, so it doesn't make sense that we'd store it in a configuration file. It's application data.

chanjunkai11 commented 1 month ago

does that mean I'm only required to use the sql below for the source and source_daemon table?

INSERT INTO webapi.source (source_id, source_name, source_key, source_connection, source_dialect) 
SELECT nextval('webapi.source_sequence'), 'databricks', 'databricks', 'jdbc:spark://<workspace>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5269192184046240/0404-024850-qo6kj1mb;UseNativeQuery=1;AuthMech=3;UID=token;PWD=<token>', 'spark';

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 0, 'silver', 0
FROM webapi.source
WHERE source_key = 'databricks'
;

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 1, 'silver', 1
FROM webapi.source
WHERE source_key = 'databricks'
;

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 2, 'resultss', 1
FROM webapi.source
WHERE source_key = 'databricks'
;

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 5, 'temp', 0
FROM webapi.source
WHERE source_key = 'databricks'
;
chanjunkai11 commented 1 month ago

does that mean I'm only required to use the sql below for the source and source_daemon table?

INSERT INTO webapi.source (source_id, source_name, source_key, source_connection, source_dialect) 
SELECT nextval('webapi.source_sequence'), 'databricks', 'databricks', 'jdbc:spark://<workspace>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5269192184046240/0404-024850-qo6kj1mb;UseNativeQuery=1;AuthMech=3;UID=token;PWD=<token>', 'spark';

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 0, 'silver', 0
FROM webapi.source
WHERE source_key = 'databricks'
;

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 1, 'silver', 1
FROM webapi.source
WHERE source_key = 'databricks'
;

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 2, 'resultss', 1
FROM webapi.source
WHERE source_key = 'databricks'
;

INSERT INTO webapi.source_daimon (source_daimon_id, source_id, daimon_type, table_qualifier, priority) 
SELECT nextval('webapi.source_daimon_sequence'), source_id, 5, 'temp', 0
FROM webapi.source
WHERE source_key = 'databricks'
;

image

Hmm but why is the configuration here like this it can't seems to identify my vocab version

In databricks I my cdm catalog name is omop540 and the schema name is silver so how do I connect to it?

anthonysena commented 6 days ago

@chanjunkai11 - your setup appears correct. Is your vocabulary table populated in your CDM? That is where the vocabulary version is obtained and displayed in ATLAS.

Since this issue is a bit old, I'm hoping you were able to figure this out but please re-open if you are still facing difficulties.