WeBankFinTech / DataSphereStudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
https://github.com/WeBankFinTech/DataSphereStudio-Doc
Apache License 2.0
3.08k stars 1k forks source link

[bug]The xlsx file is imported into hive to add a new partition, which will cause the existing partition data to be deleted importData":false #662

Closed yuankang134 closed 2 years ago

yuankang134 commented 2 years ago

Search before asking

Problem Description

1、demand background (需求背景)

The BUG found in the test, the file is imported into the Hive database table, in the case of multiple partitions, when importing one of the partitions, the data of the other partitions will be deleted. (测试发现的BUG,文件导入Hive库表,在存在多个分区的情况下,现在导入其中一个分区时,会把其它分区的数据删除掉。)

2、Statement of needs (需求说明)

This problem is actually the same in version 0.X. There is indeed a problem, which may cause data to be deleted by mistake. (这个问题其实在0.X版本也是一样的,确实存在问题,有可能造成数据误删除的情况。)

3、Demand realization (需求实现)

The front-end judgment conditions need to be changed. When the file is imported into other partitions of the library table, importData should not be set to false. (前端的判断条件需要进行更改,当文件导入库表的其它分区情况时,不应该设置importData为false.) image (1) val source = """{"path":"/mnt/bdap/janicegong/neil/orc_create_like2022.xlsx","pathType":"share","encoding":"","fieldDelimiter":"","hasHeader":false,"sheet":"sheet表名","quote":"","escapeQuotes":false}"""

val destination = """{"database":"janicegong_ind","tableName":"orc_create_like202788888","importData":false,"isPartition":true,"partition":"ds","partitionValue":"20220113","isOverwrite":false,"columns":[{"name":"col_1","index":0,"comment":"","type":"string","dateFormat":""},{"name":"col_2","index":1,"comment":"","type":"string","dateFormat":""},{"name":"col_3","index":2,"comment":"","type":"string","dateFormat":""},{"name":"col_4","index":3,"comment":"","type":"string","dateFormat":""},{"name":"col_5","index":4,"comment":"","type":"string","dateFormat":""},{"name":"col_6","index":5,"comment":"","type":"string","dateFormat":""},{"name":"col_7","index":6,"comment":"","type":"string","dateFormat":""},{"name":"col_8","index":7,"comment":"","type":"string","dateFormat":""},{"name":"col_9","index":8,"comment":"","type":"string","dateFormat":""},{"name":"col_10","index":9,"comment":"","type":"string","dateFormat":""},{"name":"col_11","index":10,"comment":"","type":"string","dateFormat":""}]}"""

com.webank.wedatasphere.linkis.engineplugin.spark.imexport.LoadData.loadDataToTable(spark,source,destination)

Import again to add a new partition, (再次导入 新增加分区)

****SCRIPT CODE****

val source = """{"path":"/mnt/bdap/janicegong/neil/orc_create_like2022.xlsx","pathType":"share","encoding":"","fieldDelimiter":"","hasHeader":false,"sheet":"sheet表名","quote":"","escapeQuotes":false}"""

val destination = """{"database":"janicegong_ind","tableName":"orc_create_like202788888","importData":false,"isPartition":true,"partition":"ds","partitionValue":"77777777","isOverwrite":false,"columns":[{"name":"col_1","index":0,"comment":"","type":"string","dateFormat":""},{"name":"col_11","index":10,"comment":"","type":"string","dateFormat":""}]}"""

com.webank.wedatasphere.linkis.engineplugin.spark.imexport.LoadData.loadDataToTable(spark,source,destination) image (2) The problem is that (问题在于 ) image (3) importData":false,

importData":false,

91 Environment OK (91环境OK ) image (4)

Description

No response

Use case

No response

solutions

No response

Anything else

No response

Are you willing to submit a PR?

yuankang134 commented 2 years ago

his bug had fixed in dss-1.1.0