danhan / hcat-savi

0 stars 0 forks source link

[high] sqoop jobs in OOzie #11

Closed danhan closed 11 years ago

danhan commented 11 years ago

There are six sqoop jobs in OOzie, and each job corresponds to one table. In each job, there are two actions for the same table from two edges (ab, bc). The job is a coordinator job in which a workflow job includes two actions there.

Appointment ==> appointment job HCA ==> hca job Service ==> service job Patient ===> patient job Media ===> media job ServiceRecord ===> servicerecord job

danhan commented 11 years ago

can refer to the code directory http://code.google.com/p/gbif-occurrencestore/source/browse/trunk/oozie-apps/occurrence-process/?r=1893#occurrence-process%2Fsrc%2Fmain%2Fjava%2Forg%2Fgbif%2Foccurrencestore%2Fhive%2Fudf

danhan commented 11 years ago

hcat-migration ---- setup.sh ===> copy things to HDFS( lib to /share/lib , all jobs ) ---- create tables in hbase ===> create all 6 tables based on the schemas ---- oozie-submit.sh ===> submit the oozie job, where there is one job, called hcat-migration, which is a coordinator jobs. It is the parent ----* src --- * java ===> to store the create table code --- * oozie --- * coordinator.xml --- * job.properties --- * ab-subwf (data from AB) -- ** workflow.xml (there should be a fork including all tables there) -- ** option-patient -- ** script-patient.sh -- ** import-patient.properties
---
* bc-subwf (data from BC) --- ** workflow.xml (there should be a fork including AB and BC) --- ** script-patient.sh --- ** import-patient.properties --- ** option-patient

danhan commented 11 years ago

How to write sub-workflow Refer to https://github.com/jrkinley/oozie-examples/tree/master/src/main/workflow

danhan commented 11 years ago

fix in commit https://github.com/danhan/hcat-savi/commit/d40d61e81b58c0c4037c681cad8bbb4abfd56637