apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.8k stars 4.6k forks source link

[Feature] Sqoop component optimization #2917

Closed Eights-Li closed 4 years ago

Eights-Li commented 4 years ago

Is your feature request related to a problem? Please describe. dev branch sqoop task need to enhancment. optimization points: Sqoop's data access and data export do not support Hadoop-level custom parameters, that is, -D level parameters – MR task name – MR map and reduce memory and quantity, etc. • Split-by field is not supported. If -m is greater than 1, if the primary key of the relational database table is not self-increasing, Sqoop It may cause duplicate data imported into Hadoop. The general solution is to specify a split-by field. therefore, split-by needs support • Cannot customize parameters, such as import mysql, some tables can add –direct to speed up the import speed

Describe the solution you'd like ideas: • The task name of Sqoop is universal, and it must be changed to the required parameter on the Sqoop page • Add Hadoop custom parameter input box for setting MR parameter memory, etc. • Add Sqoop task-level custom parameters, like –driect, –fetch-size and other parameters used in specific situations • Add option button to choose, custom script or use template script, refer to the design of DataX node

743294668 commented 4 years ago

The suggestion is very good. At present, the Sqoop node type does only support data import and export, and other Sqoop commands do not support it. But according to my understanding, the person in charge of the Sqoop class node may not want to open the function of custom parameters, because this is more difficult to verify. The above is just my understanding. I hope @zixi0825 can give us your opinions. Thank you.

zixi0825 commented 4 years ago

The solution is good. When I designed, I thought that the graphical interface can meet the needs of simple and fast use. It is better to use custom scripts to support custom parameters. Custom scripts I think can be implemented with shell scripts, so they are not added to the sqoop task type.

Eights-Li commented 4 years ago

merge into dev, close