apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.8k stars 4.6k forks source link

sqoop write to hive hcatalog orc #4146

Closed snycc15 closed 3 months ago

snycc15 commented 3 years ago

建议同步写方式方式多一种选择,可考虑写到hive orc 表,性能方面会有所提升,现在是直接写到文本,例如: sqoop import --connect jdbc:oracle --username xxx --password xxx --table xxx.xxx --hcatalog-databse xxx --hcatalog-table xx --drop-and-create-hcatalog-table --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY","transactional"="false")' --m 1

Eights-Li commented 3 years ago

using hcatalog to import RMDBS have some limits : http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_unsupported_sqoop_options if you need custom sqoop task, you can use sqoop custom job in ds-dev-branch

snycc15 commented 3 years ago

The problem of "no support" does not exist. Import to Orc is a way to promote high performance at present.