Develop another adapter for Cloudberry to support Oracle as a backend database.
Tutorial to add ojdbc into cloudberry's dependencies
To add OJDBC in cloudberry's dependencies, the Apache Maven should be installed in advance.
Installing Apache Maven
It is available to download Apache Maven from the following url https://maven.apache.org/download.cgi, and make sure JDK is install and JAVA_HOME is added to the enviroment variable.
Choosing one of the following links and unzip it (works same for every operating system).
Recommended to download the Binary zip archive.
After downloading the zip file, unzip it with any tools and set the M2 enviroment variable.
Setting the environmental variable of M2_HOME for Windows
Assume you unzip to this folder – C:\Program Files\Apache\maven
Add both M2_HOME and MAVEN_HOME variables in the Windows environment, and point it to your Maven folder.
Update PATH variable, append Maven bin folder – %M2_HOME%\bin.
Running the following command on cmd will add the ojdbc to Maven's dependencies.
mvn install:install-file -Dfile={Enter the aboslute path of OJDBC.jar } -DgroupId=com.oracle -DartifactId=ojdbc7 - Dversion=12.1.0.2 -Dpackaging=jar -DgeneratePom=true
Changes to make to add maven dependencies on Cloudberry
In cloudberry/project/dependencies.scala module add the OJDBC jar in zion dependencies field.
In cloudberry/project/commons.scala module add Resolver.mavenLocal to add Maven dependencies to cloudberry.
After making these changes Cloudberry will be able to talk to Oracle database.
To set up Oracle's url please make modifications in application.conf
Tutorial to install Curl on Windows
Using Restful Client on Windows will encounter problem when recieving a response from clouberry after a berry command is sent.
Installing curl will be also required on Windows.
Step 1 Downloading the Curl
Visiting the Curl's website to choose your appropriate version coresponding to your OS.
Win64 Generic is the most common one for windows users.
After clicking the link of Stefan Kantak, a tutorial and licenses of curl will be shown.
Then scroll down to installation to download the curl.
Step 2 Setting the Environment variable of Curl
After extracting the .cab file, visiting \curl-7.61.0\AMD64 will show the CURL.EXE.
To use curl we can simple cd to the directory and use the curl as the ways used in other Operating System.
In order to use Curl everywhere, setting of Environment variable is required.
Update PATH variable, append the directory where curl is extracted
Progress
@weifeng has initialized the codebase for this work. And the made the register query work.
Fixed the supporting of "berry.meta" by adding double quoations.
Found the both PostgreSQL and Mysql used "berry.meta" as a prefix, "berry." rather than a Database in MySQL/Dataverse in AsterixDB/Tablespace in Oracle. Currently, following the logic of MySQL and PostgreSQL.
Previous work on register query might have potential errors, since sending a json with Timefield via Rest Client can cause an exception from cloudberry.
Currently, choose the way to fix if exist syntax that is not supported by Oracle, and other syntaxes that are not supported by Oracle.
Possible ways to solve if exists
Use Try{} catch{} (Works ok right now, Might have potential problems)
Use procedure in Oracle (query works on SQL Developer, having problem on Cloudberry )
Changed data type of time field to TimeStamp, since date in Oracle does not include the exact time.
//sample output
// contains(name,'test',1)>0 equivalent to match(name) against ('+test' in boolean mode);
// contains(name,'foo and bar',1)>0 equivalent to match(name) against ('+foo +bar' in boolean mode);
// contains(name,'foo and bar and test',1 ) > 0 equivalent to match(name) against ('+foo + bar +test' in boolean mode);
Latest Update
Completed register
Successfully ran a simple query using Oracle adapter
Query send via Curl
Query in SQL
select t."NAME" as "NAME" from "MYSQLDEMO" t where t."NAME"='Tao' limit 60000000;
Result from cloudberry
Table in Oracle Database
Currently, the simplest query works fine, but sending a create view might cause problems; comparison between dates still raises an exception. Since in MySQL we can directly compare strings in 'yyyy-mm-dd hh:mm:ss' format, in Oracle we must use to_date() to convert the string.
Example : select * from qz_test where qz_test.startdate > '2016-01-01'; works on MySQL but not on Oracle.
Overrode parseDrop
Since Oracle does not support if exist, we are required to write PL/SQL to determine if a table is already created.
Overrode ParseTimeRelation
This function will be called when sending a berry query to update the metadata.
Comparison between time fields will be required, since lastReadTime of berry.meta's stats will be updated.
Change the connection of OJDBC
Previously, the connection to Oracle was Specifying a Databse URL, User Name, and Password, which needs to specify url, username, and password seperately as three parameters.
To match the connection with MySQL, and PostgreSQL, I modified the connection to Specifying a Databse URL That Includes User Name and Password.
Putting both username and password on the url.
Overrode parseSelect
This is the function that generates limit, which is not supported by Oracle.
In Oracle 12Cfetch first n rows is the syntax that is almost same as limit keyword in MySQL.
Overrode parseGroupby
In both MySQL and PosgreSQL, select statement executes before where statement, but Oracle works in a inverse way. If we say select "geotag" as "state" where "text" = 'hurricane' group by "state"; we will receive a syntax error since during the executing of group by "geotag" was not renamed as "state" yet.
Overrode parseGroupByFunc
In cloudberr, the group by field has a attribute name "unit", which will extract a certain field of the timestamp. In MySQL, simly using date([DATE STRING]) or month([DATE STRING]) will be adequate. However, similar to PostgreSQL, Oracle also uses extract() function to get any time fields beside Date. To get date of a certain time field to_char(cast([DATE] as date),'yyyy-mm-dd'), and to extract other fields using extract(month from [DATE]) will be fine.
Functions need to be overrode (ALL Completed)
parseTimeRelation
parseTextRelation
parseDrop
parseCreate
parseSelect
These function must be overrode due to the issues below, there might be possible issue not noticed yet.
Issue(Solved)
replace intomysqldemo(select * frommysqldemo1);MySQL supports this syntax, not sure if oracle also supports
merge into mysqldemo d using( select * from mysqldemo1) b on (d."ID" = b."ID" ) when not matched then insert (d."ID",d."NAME",d."ADDRESS",d."CITY",d."COUNT",d."THEDATE") values (b."ID",b."NAME",b."ADDRESS",b."CITY",b."COUNT",b."THEDATE");
Found a possible solution.
Update August 5th
Completed the script to insert 46000 data to MySQL.
Rewrote the script since the previous one was in PHP and there were some errors that I was not sure how to solve.
Inserted 46000 tweets to MySQL
Issue (using older version front-end solved the problem)
After I inserted all 4 datasets of twitterMap the MySQL back-end still has some issues.
I found that the twitterMap has different logics on different back-end database. Since twitterMap is very different from previous version, it is unable to run on our current version.
Then I downloaded the previous version of twitterMap, and the MySQL backend works fine.
Current Issue (Solved)
select t.geo_tag.stateID as state,count(*) as count
from twitter_ds_tweet t
where t.create_at >= '2018-08-11 12:32:41' and t.create_at < '2018-08-13 12:32:41' and t.geo_tag.stateID in ( 37,51,24,11,10,34,42,9,44,48,35,4,40,6,20,32,8,49,12,22,28,1,13,45,5,47,21,29,54,17,18,39,19,55,26,27,31,56,41,46,16,30,53,38,25,36,50,33,23,2 ) and match(t.text) against ('+zika' in boolean mode)
group by state; In MySQL the query works fine.
However, in Oracle we cannot group by "state"; we can only group by t."geo_tag.stateID".
Known Issues To fix
[x] Make sure whether the typeName under schema in registering dataset message to Cloudberry
is necessary or not, if necessary, it's a unique concept in AsterixDB, we need to use some arbitrary value for that, say table name
[x] What if a primary key is a composite key (Tested on SQL Developer Oracle supports composite key)
[x] Be careful that in Oracle the table names or field names cannot be wrapped with quotation marks (Oracle 's table name uses double quotation marks, not a issue for now.)
[x] Date type field needs to be in syntax like this: to_date('2018-06-01', 'yyyy-mm-dd')
[x] Oracle does not support some syntax keywords such as replace into, exist, limit, etc (used merge into instead of replace into)
[x] Make sure how the mechanism of "dot"(.) in Oracle works, and replace the database concept in AsterixDB or MySQL with tablespace concept in Oracle
[x] Be careful that in Oracle, VARCHAR field does not have operator =
[x] Be careful that in Oracle, all field names returned are in upper case by default (This can be fixed by adding double quotes)
[x] More thoughts might need to be put into for limit keyword and new data types in Oracle.
TO DO List
Monday July 23
[x] Find the proper way to Check Oracle log including error logs. (Still using the previous way to check log, but found why the command was not working)
[x] Test PLSQL query on a simple java program.
[x] Try simple register and berry query of Mysql on a Mac or Linux machine to make sure how cloudberry behaves. (Had problem on restclients, Mysql adapter works perfectly, using curl on windows solves the problem )
[x] Create berry.meta table and other Oracle data in a tablespace other than SYSTEM.
Tuesday July 24
[x] Figure out that if limit keyword is a problem for Oracle (Eliminated the key work limit by professor Shan since the numbers were very large). TO DO use rownow in the future.
[x] Change insert into to merge into (to use as replace into in MySQL).
Wednesday, July 25
[x] Write a tutorial about curl on windows, since restclient behaves incorrectly when receiving a response from Cloudberry.
[x] Attempt to start debug the problems on Berry request.
July 27- 30
[x] Try to solve the problem of details when sending a berryquery (solved some problem)
Until August 2nd
[x] Try to complete all the functions required to send a berry query (Still working on parseCreate other might work fine)
[x] Change the way Oracle is connected, currently the username and password is declared in a format different from other databases.
Monday August 6th
[x] Try to ingest the data of twitterMap to MySQL
Thursday August 9th
[x] Ingest the data of twitterMap to Oracle.
Lastest Progress
OracleAdapter is almost done, the twittermap can run properly without problems with Oracle as the backend database.
Summary
Major differences from MySQL adapter
Changes
Explanation
Convert if exists to pl/sql
Oracle 's syntax does not support keyword if exists, creating berry.meta, dropping table, and creating view's if exists should be replaced to pl/sql. (Raw Data / Meta Data)
limit
In Oraclefetch first n rows is equivalent to limit(Raw Data/Meta Data)
replace into
replace in to is used when creating a view in the database. merge into can be used in Oracle instead (Raw Data / Meta Data)
group by
In MySQL the select clause executes first, but in Oracle where clause executes first. Changes in the function of parsing group by. (Raw Data)
Extracting time units
In MySQL there are functions to extract a certain time field, and the usage in Oracle is different. (Raw Data)
Full text search
Different syntax (Raw Data)
Adding quotations
Currently, names such as berry.meta, stats.creatTime are all using quotations to correspond to AsterixDB(Same in MySQL) (Meta Data / Raw Data)
Functions overrode
parseTimeRelation
parseTextRelation
parseDrop
parseCreate
parseSelect
parseGroupByFunc
parseGroupBy
parseUpssertMeta
fieldType2SQLType
New issue found
select tt."county" as "county",tt."count" as "count",ll0."population" as "population" from ( select "geo_tag.countyID" as "county",count(*) as "count" from "twitter.ds_tweet_56ab24c15b72a457069c5ea42fcfc640" t where t."create_at" >= to_date('2018-01-02 00:00:00','YYYY-MM-DD HH24:MI:SS') and t."create_at" < to_date('2018-01-04 00:00:00','YYYY-MM-DD HH24:MI:SS') and t."geo_tag.countyID" in ( more than 1000 county IDs ) group by "geo_tag.countyID" ) tt left outer join "twitter.dsCountyPopulation" ll0 on ll0."countyID" = tt."county";
Oracle does not allow a query's list exceed 1000 elements. This will happen when we zoom twittermap's screen to a place more than 1000 counties.
Solutions proposed by Taewoo
To change the query to state level instead of county
To set a restriction that when the elements in the list exceeds 1000, only query on those 1000 couties.
Oracle Adapter
Overview
Develop another adapter for Cloudberry to support Oracle as a backend database.
Tutorial to add ojdbc into cloudberry's dependencies
To add OJDBC in cloudberry's dependencies, the Apache Maven should be installed in advance.
Installing Apache Maven
It is available to download Apache Maven from the following url https://maven.apache.org/download.cgi, and make sure JDK is install and JAVA_HOME is added to the enviroment variable.
Choosing one of the following links and unzip it (works same for every operating system). Recommended to download the Binary zip archive.
After downloading the zip file, unzip it with any tools and set the M2 enviroment variable.
Setting the environmental variable of M2_HOME for Windows
Assume you unzip to this folder – C:\Program Files\Apache\maven Add both M2_HOME and MAVEN_HOME variables in the Windows environment, and point it to your Maven folder.
Update PATH variable, append Maven bin folder – %M2_HOME%\bin.
Running the following command on cmd will add the ojdbc to Maven's dependencies.
mvn install:install-file -Dfile={Enter the aboslute path of OJDBC.jar } -DgroupId=com.oracle -DartifactId=ojdbc7 - Dversion=12.1.0.2 -Dpackaging=jar -DgeneratePom=true
Changes to make to add maven dependencies on Cloudberry
In
cloudberry/project/dependencies.scala
module add the OJDBC jar in zion dependencies field.In
cloudberry/project/commons.scala
module add Resolver.mavenLocal to add Maven dependencies to cloudberry. After making these changes Cloudberry will be able to talk to Oracle database. To set up Oracle's url please make modifications inapplication.conf
Tutorial to install Curl on Windows
Using Restful Client on Windows will encounter problem when recieving a response from clouberry after a
berry
command is sent. Installing curl will be also required on Windows.Step 1 Downloading the Curl
Visiting the Curl's website to choose your appropriate version coresponding to your OS.
Win64 Generic is the most common one for windows users. After clicking the link of Stefan Kantak, a tutorial and licenses of curl will be shown. Then scroll down to installation to download the curl.
Step 2 Setting the Environment variable of Curl
After extracting the
.cab
file, visiting\curl-7.61.0\AMD64
will show theCURL.EXE
. To use curl we can simplecd
to the directory and use the curl as the ways used in other Operating System. In order to use Curl everywhere, setting of Environment variable is required. Update PATH variable, append the directory where curl is extractedProgress
@weifeng has initialized the codebase for this work. And the made the
register
query work.Fixed the supporting of "berry.meta" by adding double quoations.
Found the both PostgreSQL and Mysql used "berry.meta" as a prefix, "berry." rather than a Database in MySQL/Dataverse in AsterixDB/Tablespace in Oracle. Currently, following the logic of MySQL and PostgreSQL.
Previous work on
register
query might have potential errors, since sending a json with Timefield via Rest Client can cause an exception from cloudberry.Currently, choose the way to fix if exist syntax that is not supported by Oracle, and other syntaxes that are not supported by Oracle.
Possible ways to solve if exists
Use Try{} catch{} (Works ok right now, Might have potential problems)
Use procedure in Oracle (query works on SQL Developer, having problem on Cloudberry )
Changed data type of time field to TimeStamp, since date in Oracle does not include the exact time.
Latest Update
Query send via Curl
Query in SQL
select t."NAME" as "NAME" from "MYSQLDEMO" t where t."NAME"='Tao' limit 60000000;
Result from cloudberry
Table in Oracle Database
Currently, the simplest query works fine, but sending a create view might cause problems; comparison between dates still raises an exception. Since in MySQL we can directly compare strings in 'yyyy-mm-dd hh:mm:ss' format, in Oracle we must use to_date() to convert the string.
Example :
select * from qz_test where qz_test.startdate > '2016-01-01';
works on MySQL but not on Oracle.Overrode
parseDrop
Since Oracle does not supportif exist
, we are required to writePL/SQL
to determine if a table is already created. OverrodeParseTimeRelation
This function will be called when sending a berry query to update the metadata. Comparison between time fields will be required, sincelastReadTime
ofberry.meta
's stats will be updated. Change the connection of OJDBC Previously, the connection to Oracle wasSpecifying a Databse URL, User Name, and Password
, which needs to specifyurl
,username
, andpassword
seperately as three parameters. To match the connection withMySQL
, andPostgreSQL
, I modified the connection toSpecifying a Databse URL That Includes User Name and Password
. Putting bothusername
andpassword
on theurl
. OverrodeparseSelect
This is the function that generateslimit
, which is not supported byOracle
. InOracle 12C
fetch first n rows
is the syntax that is almost same aslimit
keyword inMySQL
. OverrodeparseGroupby
In bothMySQL
andPosgreSQL
, select statement executes before where statement, butOracle
works in a inverse way. If we say select"geotag" as "state" where "text" = 'hurricane' group by "state";
we will receive a syntax error since during the executing of group by "geotag" was not renamed as "state" yet. OverrodeparseGroupByFunc
In cloudberr, the group by field has a attribute name "unit", which will extract a certain field of the timestamp. InMySQL
, simly using date([DATE STRING]) or month([DATE STRING]) will be adequate. However, similar toPostgreSQL
,Oracle
also usesextract()
function to get any time fields besideDate
. To get date of a certain time fieldto_char(cast([DATE] as date),'yyyy-mm-dd')
, and to extract other fields using extract(month from [DATE]) will be fine.Functions need to be overrode (ALL Completed)
parseTimeRelation
parseTextRelation
parseDrop
parseCreate
parseSelect
These function must be overrode due to the issues below, there might be possible issue not noticed yet.Issue(Solved)
replace into
mysqldemo(select * from
mysqldemo1);
MySQL
supports this syntax, not sure if oracle also supportsmerge into mysqldemo d using( select * from mysqldemo1) b on (d."ID" = b."ID" ) when not matched then insert (d."ID",d."NAME",d."ADDRESS",d."CITY",d."COUNT",d."THEDATE") values (b."ID",b."NAME",b."ADDRESS",b."CITY",b."COUNT",b."THEDATE");
Found a possible solution.Update August 5th
MySQL
. Rewrote the script since the previous one was inPHP
and there were some errors that I was not sure how to solve.tweets
toMySQL
Issue (using older version front-end solved the problem)
After I inserted all 4
datasets
oftwitterMap
theMySQL
back-end still has some issues. I found that thetwitterMap
has different logics on different back-end database. SincetwitterMap
is very different from previous version, it is unable to run on our current version. Then I downloaded the previous version oftwitterMap
, and theMySQL
backend works fine.Current Issue (Solved)
select t.
geo_tag.stateID
asstate
,count(*) ascount
fromtwitter_ds_tweet
t where t.create_at
>= '2018-08-11 12:32:41' and t.create_at
< '2018-08-13 12:32:41' and t.geo_tag.stateID
in ( 37,51,24,11,10,34,42,9,44,48,35,4,40,6,20,32,8,49,12,22,28,1,13,45,5,47,21,29,54,17,18,39,19,55,26,27,31,56,41,46,16,30,53,38,25,36,50,33,23,2 ) and match(t.text
) against ('+zika' in boolean mode) group bystate
; InMySQL
the query works fine. However, inOracle
we cannot group by "state"; we can only group by t."geo_tag.stateID".Known Issues To fix
typeName
underschema
in registering dataset message to Cloudberry is necessary or not, if necessary, it's a unique concept in AsterixDB, we need to use some arbitrary value for that, saytable name
SQL Developer
Oracle supportscomposite key
)Date
type field needs to be in syntax like this:to_date('2018-06-01', 'yyyy-mm-dd')
replace into
,exist
,limit
, etc (used merge into instead of replace into).
) in Oracle works, and replace thedatabase
concept in AsterixDB or MySQL withtablespace
concept in OracleVARCHAR
field does not have operator=
limit
keyword and new data types in Oracle.TO DO List
Monday July 23
berry.meta
table and other Oracle data in atablespace
other thanSYSTEM
.Tuesday July 24
limit
keyword is a problem for Oracle (Eliminated the key work limit by professor Shan since the numbers were very large). TO DO use rownow in the future.insert into
tomerge into
(to use asreplace into
inMySQL
).Wednesday, July 25
July 27- 30
Until August 2nd
parseCreate
other might work fine)Monday August 6th
twitterMap
toMySQL
Thursday August 9th
twitterMap
toOracle
.Lastest Progress
OracleAdapter is almost done, the
twittermap
can run properly without problems withOracle
as the backend database.Summary
Major differences from
MySQL
adapterif exists
topl/sql
Oracle
's syntax does not support keywordif exists
, creatingberry.meta
, dropping table, and creating view'sif exists
should be replaced topl/sql
. (Raw Data / Meta Data)limit
Oracle
fetch first n rows
is equivalent tolimit
(Raw Data/Meta Data)replace into
replace in to
is used when creating a view in the database.merge into
can be used inOracle instead
(Raw Data / Meta Data)group by
MySQL
the select clause executes first, but inOracle
where clause executes first. Changes in the function of parsing group by. (Raw Data)MySQL
there are functions to extract a certain time field, and the usage inOracle
is different. (Raw Data)berry.meta
,stats.creatTime
are all using quotations to correspond toAsterixDB
(Same inMySQL
) (Meta Data / Raw Data)Functions overrode
parseTimeRelation
parseTextRelation
parseDrop
parseCreate
parseSelect
parseGroupByFunc
parseGroupBy
parseUpssertMeta
fieldType2SQLType
New issue found
select tt."county" as "county",tt."count" as "count",ll0."population" as "population" from ( select "geo_tag.countyID" as "county",count(*) as "count" from "twitter.ds_tweet_56ab24c15b72a457069c5ea42fcfc640" t where t."create_at" >= to_date('2018-01-02 00:00:00','YYYY-MM-DD HH24:MI:SS') and t."create_at" < to_date('2018-01-04 00:00:00','YYYY-MM-DD HH24:MI:SS') and t."geo_tag.countyID" in ( more than 1000 county IDs ) group by "geo_tag.countyID" ) tt left outer join "twitter.dsCountyPopulation" ll0 on ll0."countyID" = tt."county";
Oracle does not allow a query's list exceed 1000 elements. This will happen when we zoom twittermap's screen to a place more than 1000 counties.
Solutions proposed by Taewoo