ISG-ICS / cloudberry

Big Data Visualization
http://cloudberry.ics.uci.edu
91 stars 82 forks source link

Oracle Adapter #552

Closed taoenkh closed 5 years ago

taoenkh commented 6 years ago

Oracle Adapter

Overview

Develop another adapter for Cloudberry to support Oracle as a backend database.

Tutorial to add ojdbc into cloudberry's dependencies

To add OJDBC in cloudberry's dependencies, the Apache Maven should be installed in advance.

Installing Apache Maven

It is available to download Apache Maven from the following url https://maven.apache.org/download.cgi, and make sure JDK is install and JAVA_HOME is added to the enviroment variable.

Choosing one of the following links and unzip it (works same for every operating system). Recommended to download the Binary zip archive. image

After downloading the zip file, unzip it with any tools and set the M2 enviroment variable.

Setting the environmental variable of M2_HOME for Windows

Assume you unzip to this folder – C:\Program Files\Apache\maven Add both M2_HOME and MAVEN_HOME variables in the Windows environment, and point it to your Maven folder. image

Update PATH variable, append Maven bin folder – %M2_HOME%\bin.

image

Running the following command on cmd will add the ojdbc to Maven's dependencies. mvn install:install-file -Dfile={Enter the aboslute path of OJDBC.jar } -DgroupId=com.oracle -DartifactId=ojdbc7 - Dversion=12.1.0.2 -Dpackaging=jar -DgeneratePom=true

Changes to make to add maven dependencies on Cloudberry

In cloudberry/project/dependencies.scala module add the OJDBC jar in zion dependencies field. image

In cloudberry/project/commons.scala module add Resolver.mavenLocal to add Maven dependencies to cloudberry. image After making these changes Cloudberry will be able to talk to Oracle database. To set up Oracle's url please make modifications in application.conf

Tutorial to install Curl on Windows

Using Restful Client on Windows will encounter problem when recieving a response from clouberry after a berry command is sent. Installing curl will be also required on Windows.

Step 1 Downloading the Curl

Visiting the Curl's website to choose your appropriate version coresponding to your OS.

image

Win64 Generic is the most common one for windows users. After clicking the link of Stefan Kantak, a tutorial and licenses of curl will be shown. Then scroll down to installation to download the curl. image

Step 2 Setting the Environment variable of Curl

After extracting the .cab file, visiting \curl-7.61.0\AMD64 will show the CURL.EXE. To use curl we can simple cd to the directory and use the curl as the ways used in other Operating System. In order to use Curl everywhere, setting of Environment variable is required. image Update PATH variable, append the directory where curl is extracted

Progress

@weifeng has initialized the codebase for this work. And the made the register query work.

image

Previous work on register query might have potential errors, since sending a json with Timefield via Rest Client can cause an exception from cloudberry.

Currently, choose the way to fix if exist syntax that is not supported by Oracle, and other syntaxes that are not supported by Oracle.

Possible ways to solve if exists

image Changed data type of time field to TimeStamp, since date in Oracle does not include the exact time.

//sample output
// contains(name,'test',1)>0 equivalent to match(name) against ('+test' in boolean mode);
// contains(name,'foo and bar',1)>0   equivalent to match(name) against ('+foo +bar' in boolean mode);
// contains(name,'foo and bar and test',1 ) > 0   equivalent to match(name) against ('+foo + bar +test' in boolean mode);

Latest Update

Query send via Curl image

Query in SQL

select t."NAME" as "NAME" from "MYSQLDEMO" t where t."NAME"='Tao' limit 60000000;

Result from cloudberry image

Table in Oracle Database

image

Currently, the simplest query works fine, but sending a create view might cause problems; comparison between dates still raises an exception. Since in MySQL we can directly compare strings in 'yyyy-mm-dd hh:mm:ss' format, in Oracle we must use to_date() to convert the string.

Example : select * from qz_test where qz_test.startdate > '2016-01-01'; works on MySQL but not on Oracle.

Overrode parseDrop Since Oracle does not support if exist, we are required to write PL/SQL to determine if a table is already created. image Overrode ParseTimeRelation This function will be called when sending a berry query to update the metadata. Comparison between time fields will be required, since lastReadTime of berry.meta's stats will be updated. image Change the connection of OJDBC image Previously, the connection to Oracle was Specifying a Databse URL, User Name, and Password, which needs to specify url, username, and password seperately as three parameters. image To match the connection with MySQL, and PostgreSQL, I modified the connection to Specifying a Databse URL That Includes User Name and Password. image Putting both username and password on the url. image image Overrode parseSelect This is the function that generates limit, which is not supported by Oracle. In Oracle 12C fetch first n rows is the syntax that is almost same as limit keyword in MySQL. image Overrode parseGroupby image In both MySQL and PosgreSQL, select statement executes before where statement, but Oracle works in a inverse way. If we say select "geotag" as "state" where "text" = 'hurricane' group by "state"; we will receive a syntax error since during the executing of group by "geotag" was not renamed as "state" yet. Overrode parseGroupByFunc In cloudberr, the group by field has a attribute name "unit", which will extract a certain field of the timestamp. In MySQL, simly using date([DATE STRING]) or month([DATE STRING]) will be adequate. However, similar to PostgreSQL, Oracle also uses extract() function to get any time fields beside Date. To get date of a certain time field to_char(cast([DATE] as date),'yyyy-mm-dd'), and to extract other fields using extract(month from [DATE]) will be fine. image image image

Functions need to be overrode (ALL Completed)

Issue(Solved)

replace intomysqldemo(select * frommysqldemo1); MySQL supports this syntax, not sure if oracle also supports

merge into mysqldemo d using( select * from mysqldemo1) b on (d."ID" = b."ID" ) when not matched then insert (d."ID",d."NAME",d."ADDRESS",d."CITY",d."COUNT",d."THEDATE") values (b."ID",b."NAME",b."ADDRESS",b."CITY",b."COUNT",b."THEDATE"); Found a possible solution.

Update August 5th

Known Issues To fix

TO DO List

Monday July 23

New issue found

select tt."county" as "county",tt."count" as "count",ll0."population" as "population" from ( select "geo_tag.countyID" as "county",count(*) as "count" from "twitter.ds_tweet_56ab24c15b72a457069c5ea42fcfc640" t where t."create_at" >= to_date('2018-01-02 00:00:00','YYYY-MM-DD HH24:MI:SS') and t."create_at" < to_date('2018-01-04 00:00:00','YYYY-MM-DD HH24:MI:SS') and t."geo_tag.countyID" in ( more than 1000 county IDs ) group by "geo_tag.countyID" ) tt left outer join "twitter.dsCountyPopulation" ll0 on ll0."countyID" = tt."county";

Oracle does not allow a query's list exceed 1000 elements. This will happen when we zoom twittermap's screen to a place more than 1000 counties.

Solutions proposed by Taewoo

  1. To change the query to state level instead of county
  2. To set a restriction that when the elements in the list exceeds 1000, only query on those 1000 couties.
chenlica commented 5 years ago

@taoenkh @baiqiushi should we close it?