alibaba / clusterdata

cluster data collected from production clusters in Alibaba for cluster management research
1.61k stars 409 forks source link

csv file structure corrupted. #222

Open liudonghua123 opened 1 day ago

liudonghua123 commented 1 day ago

Hi, I run bash fetchData.sh start_date=0d0 end_date=1d1 to download the files. And I extract CallGraph/CallGraph_0.tar.gz to get CallGraph_0.csv.

I use csvq to parse the csv, then I got parse error around line 58755, the rpc_id of this line is 0.1.1.1,0.1.1.1 without quotes.

[root@ha-master-1 CallGraph]# csvq 'select `rpctype`,count(*) from `./CallGraph_0.csv` group by rpctype'
[L:1 C:32] data parse error in /root/code/clusterdata/cluster-trace-microservices-v2022/data/CallGraph/CallGraph_0.csv: line 58755, column 116: wrong number of fields in line
[root@ha-master-1 CallGraph]#

image

And I also find some other strange errors about the data.

image

liudonghua123 commented 1 day ago

I currently use sed -Ei 's/([0-9]+\.[0-9]+(\.[0-9]+)*,\s?)+[0-9]+(\.[0-9]+)*/"\0"/g' CallGraph_0.csv to add quotes around this column of data.

I am not sure if this is correct.