DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
[X] I had searched in the issues and found no similar feature requirement.
Problem Description
1、demand background
(需求背景)
The BUG found in the test, the file is imported into the Hive database table, in the case of multiple partitions, when importing one of the partitions, the data of the other partitions will be deleted.
(测试发现的BUG,文件导入Hive库表,在存在多个分区的情况下,现在导入其中一个分区时,会把其它分区的数据删除掉。)
2、Statement of needs
(需求说明)
This problem is actually the same in version 0.X. There is indeed a problem, which may cause data to be deleted by mistake.
(这个问题其实在0.X版本也是一样的,确实存在问题,有可能造成数据误删除的情况。)
3、Demand realization
(需求实现)
The front-end judgment conditions need to be changed. When the file is imported into other partitions of the library table, importData should not be set to false.
(前端的判断条件需要进行更改,当文件导入库表的其它分区情况时,不应该设置importData为false.)
val source = """{"path":"/mnt/bdap/janicegong/neil/orc_create_like2022.xlsx","pathType":"share","encoding":"","fieldDelimiter":"","hasHeader":false,"sheet":"sheet表名","quote":"","escapeQuotes":false}"""
val destination = """{"database":"janicegong_ind","tableName":"orc_create_like202788888","importData":false,"isPartition":true,"partition":"ds","partitionValue":"20220113","isOverwrite":false,"columns":[{"name":"col_1","index":0,"comment":"","type":"string","dateFormat":""},{"name":"col_2","index":1,"comment":"","type":"string","dateFormat":""},{"name":"col_3","index":2,"comment":"","type":"string","dateFormat":""},{"name":"col_4","index":3,"comment":"","type":"string","dateFormat":""},{"name":"col_5","index":4,"comment":"","type":"string","dateFormat":""},{"name":"col_6","index":5,"comment":"","type":"string","dateFormat":""},{"name":"col_7","index":6,"comment":"","type":"string","dateFormat":""},{"name":"col_8","index":7,"comment":"","type":"string","dateFormat":""},{"name":"col_9","index":8,"comment":"","type":"string","dateFormat":""},{"name":"col_10","index":9,"comment":"","type":"string","dateFormat":""},{"name":"col_11","index":10,"comment":"","type":"string","dateFormat":""}]}"""
val source = """{"path":"/mnt/bdap/janicegong/neil/orc_create_like2022.xlsx","pathType":"share","encoding":"","fieldDelimiter":"","hasHeader":false,"sheet":"sheet表名","quote":"","escapeQuotes":false}"""
val destination = """{"database":"janicegong_ind","tableName":"orc_create_like202788888","importData":false,"isPartition":true,"partition":"ds","partitionValue":"77777777","isOverwrite":false,"columns":[{"name":"col_1","index":0,"comment":"","type":"string","dateFormat":""},{"name":"col_11","index":10,"comment":"","type":"string","dateFormat":""}]}"""
com.webank.wedatasphere.linkis.engineplugin.spark.imexport.LoadData.loadDataToTable(spark,source,destination)
The problem is that
(问题在于 )
importData":false,
Search before asking
Problem Description
1、demand background (需求背景)
The BUG found in the test, the file is imported into the Hive database table, in the case of multiple partitions, when importing one of the partitions, the data of the other partitions will be deleted. (测试发现的BUG,文件导入Hive库表,在存在多个分区的情况下,现在导入其中一个分区时,会把其它分区的数据删除掉。)
2、Statement of needs (需求说明)
This problem is actually the same in version 0.X. There is indeed a problem, which may cause data to be deleted by mistake. (这个问题其实在0.X版本也是一样的,确实存在问题,有可能造成数据误删除的情况。)
3、Demand realization (需求实现)
The front-end judgment conditions need to be changed. When the file is imported into other partitions of the library table, importData should not be set to false. (前端的判断条件需要进行更改,当文件导入库表的其它分区情况时,不应该设置importData为false.) val source = """{"path":"/mnt/bdap/janicegong/neil/orc_create_like2022.xlsx","pathType":"share","encoding":"","fieldDelimiter":"","hasHeader":false,"sheet":"sheet表名","quote":"","escapeQuotes":false}"""
val destination = """{"database":"janicegong_ind","tableName":"orc_create_like202788888","importData":false,"isPartition":true,"partition":"ds","partitionValue":"20220113","isOverwrite":false,"columns":[{"name":"col_1","index":0,"comment":"","type":"string","dateFormat":""},{"name":"col_2","index":1,"comment":"","type":"string","dateFormat":""},{"name":"col_3","index":2,"comment":"","type":"string","dateFormat":""},{"name":"col_4","index":3,"comment":"","type":"string","dateFormat":""},{"name":"col_5","index":4,"comment":"","type":"string","dateFormat":""},{"name":"col_6","index":5,"comment":"","type":"string","dateFormat":""},{"name":"col_7","index":6,"comment":"","type":"string","dateFormat":""},{"name":"col_8","index":7,"comment":"","type":"string","dateFormat":""},{"name":"col_9","index":8,"comment":"","type":"string","dateFormat":""},{"name":"col_10","index":9,"comment":"","type":"string","dateFormat":""},{"name":"col_11","index":10,"comment":"","type":"string","dateFormat":""}]}"""
com.webank.wedatasphere.linkis.engineplugin.spark.imexport.LoadData.loadDataToTable(spark,source,destination)
Import again to add a new partition, (再次导入 新增加分区)
****SCRIPT CODE****
val source = """{"path":"/mnt/bdap/janicegong/neil/orc_create_like2022.xlsx","pathType":"share","encoding":"","fieldDelimiter":"","hasHeader":false,"sheet":"sheet表名","quote":"","escapeQuotes":false}"""
val destination = """{"database":"janicegong_ind","tableName":"orc_create_like202788888","importData":false,"isPartition":true,"partition":"ds","partitionValue":"77777777","isOverwrite":false,"columns":[{"name":"col_1","index":0,"comment":"","type":"string","dateFormat":""},{"name":"col_11","index":10,"comment":"","type":"string","dateFormat":""}]}"""
com.webank.wedatasphere.linkis.engineplugin.spark.imexport.LoadData.loadDataToTable(spark,source,destination) The problem is that (问题在于 ) importData":false,
importData":false,
91 Environment OK (91环境OK )
Description
No response
Use case
No response
solutions
No response
Anything else
No response
Are you willing to submit a PR?