apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.33k stars 3.66k forks source link

Parsing Json data - Flattening columns generates 'null' if column contains '(B)' #12711

Open Rachmaninoffff opened 2 years ago

Rachmaninoffff commented 2 years ago

Affected Version

0.22.1

Description

Please include as much detailed information about the problem as possible.

Flattening columns generates 'null' if column contains '(B)' i guess because of '()' image

Rachmaninoffff commented 2 years ago

Original row: { "severitylabel": "1", "severity": "9", "geoip": { "country(B)name": "xxzhongguo", "city_name": "xxhaerbin", "owner_domain": "null", "region_name": "xxheilongjian", "isp_domain": { "Chengdu": "Liangjie", "beijing": { "YJZ": "Liangjie", "shuzu": [ "1", "2", "dasd" ] } }, "ip": "221.207.218.153", "times": "20220622 07:26:15" }, "huanqiu": { "country_name": "xxzhongguo", "city_name": "xxhaerbin", "owner_domain": "null", "region_name": "xxheilongjian", "isp_domain": { "Chengdu": "Liangjie", "beijing": "tianjing" }, "ip": "221.207.218.153", "times": "20220622 07:26:15" }, "vpn_name": " SSLvpn YGBX", "facility_label": "kernel", "EventID": "45140632", "sys_type": "190", "result": "passed", "user_ip": "10.222.85.103", "priority": "10", "UserCode": "BA5202999999", "@timestamp": "2022-06-28T17:32:52.873080Z", "module": "VPN", "vpn_ip": "10.249.192.84", "os_user": "root", "type": "network", "time": "Dec 11 08:33:29", "OS": "Windows 7 Ultimate - 7601.win7sp1 ldr escrow.1903051700", "client_version": "1.4.9.1274", "@version": "1", "SPI": "Ob5cf596", "MAC": "44:37:E6:32:55:60", "host": "10.10.220.25", "client_ip": "221.207.218.153", "tags": [ "~rokparsefailure sysloginput" ], "user": "BA52020009255", "facility": "11" }

FrankChen021 commented 2 years ago

You're right, this is due to "()" in the name of a json node.

And Because this is caused by JsonPath which is used Druid to flatten json object, I don't think this problem can be addressed in short time. Before that, you have to do some ETL to process such "invalid" characters.