Closed clojurians-org closed 4 years ago
Thank you for the suggestion, @clojurians-org. Would you like to directly create a PR for this improvement?
I'm glad to do it. but currently i use clojure to do script thing for JVM ecosystem. i don't know whether you guys will accept it. the related document: https://clojure.org/guides/deps_and_cli
a simple script command is: clj jdbc.clj
it contains two file: deps.edn and jdbc.clj
[op@my-200 jdbc-etl]$ cat deps.edn
{:mvn/repos
{ "maven-repos" {:url "http://10.132.37.56:8081/repository/maven-central/"}}
:deps
{ org.clojure/java.jdbc {:mvn/version "0.7.10"}
postgresql/postgresql {:mvn/version "9.1-901-1.jdbc4"} }
}
[op@my-200 jdbc-etl]$ cat jdbc.clj
(require '[clojure.java.jdbc :as j])
(def pg-db {:dbtype "postgresql"
:dbname "monitor"
:host "10.132.37.201"
:user "monitor"
:password "monitor"})
(println (j/query pg-db "select * from information_schema.columns limit 10") )
i already write the craft version in case you're interested! :NOTE there's some adjustment remaining i will do in next few days. i'll also add the mce/mae-job clojure scripting way!
cd metadata-ingestion/clj-etl && clj jdbc.clj
This looks great. Could you please create a PR to put your script and some simple README under https://github.com/linkedin/WhereHows/tree/datahub/contrib/metadata-ingestion? Thanks.
BTW: i benchmark the old way and new ways. the new way is 100 times fast than old way. the ingestion rate bottleneck is up to the consumer job now.
total table num: 7218
python:
real 117m30.338s
user 2m5.205s
sys 0m10.801s
clojure:
real 0m29.489s
user 0m33.958s
sys 0m2.039s
i'll submit it in one weekend.
Thanks. Look forward to the PR!
This will likely be taken care of by @jplaisted as part of https://github.com/linkedin/datahub/issues/1743. Closing it for now.
Noted to verify :)
i use WhereHows/metadata-ingestion/mysql_etl.py to load data, and load oracle data by similar step(metadata-ingestion/rdbms_etl.py). but when the database contains too much table, it's too slow to finish.
i think it should be better to load all schema information in on step, and group it to build the final schema information in streaming way rather than per each table alone.
i attached the simple oracle script for completeness.