Kyligence / kylin-tpch

Run TPCH Benchmark on Apache Kylin
22 stars 22 forks source link

[Doc] Benchmark step of Kylin 3.1.2 #3

Open hit-lacus opened 3 years ago

hit-lacus commented 3 years ago

Table of Content

hit-lacus commented 3 years ago

Background

Background

Provided a standard way to measure Kylin's performance using equal resources/env and under equal workload.

Report Template

Metrics Name Metrics Value
Scale Factor X
Data Load Duration X minutes
Storage Size X MB
Query RT(Total) X seconds
QPS X
hit-lacus commented 3 years ago

AWS EMR Cluster

Basic Cluster Info

Key Value
Node Memory 64GB
Node Core 16
Node (Instance Type) m5.4xlarge
Node Disk 1500GB + 100GB(SSD)
Node Num 4 Worker + 1 Master
EMR Version emr-5.31.0
Kylin Version 3.1.2
Yarn Memory 204.80 GB
Yarn Core 52
HBase Memory 112 GB (28*4)
HBase Core ?
HDFS Capacity 2.91 TB

CLI

aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase Name=Spark Name=Sqoop Name=Tez Name=ZooKeeper Name=Ganglia \
  --tags 'Cost Center=OS' 'Project=Kylin3_Benchmark' 'CRR=x' 'Owner=x' \
  --ec2-attributes '{"KeyName":"XiaoxiangYu","AdditionalSlaveSecurityGroups":[""],"InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"","EmrManagedSlaveSecurityGroup":"","EmrManagedMasterSecurityGroup":"","AdditionalMasterSecurityGroups":[""]}' \
  --release-label emr-5.31.0 \
  --log-uri 's3n://x/' \
  --instance-groups '[{"InstanceCount":4,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":750,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"InstanceGroupType":"CORE","InstanceType":"m5.4xlarge","Name":"Hadoop Workers"},{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"m5.4xlarge","Name":"Hadoop Master"}]' \
  --configurations '[{"Classification":"hdfs-site","Properties":{"dfs.replication":"1"}},{"Classification":"mapred-site","Properties":{"mapreduce.map.memory.mb":"3584","mapreduce.reduce.memory.mb":"8192","mapreduce.map.java.opts":"-Xmx3072m","mapreduce.reduce.java.opts":"-Xmx7168m"}},{"Classification":"yarn-site","Properties":{"yarn.nodemanager.resource.cpu-vcores":"13","yarn.nodemanager.resource.memory-mb":"52428","yarn.scheduler.maximum-allocation-mb":"52428","yarn.app.mapreduce.am.resource.mb":"2048"}}]' \
  --auto-scaling-role EMR_AutoScaling_DefaultRole \
  --ebs-root-volume-size 100 \
  --service-role EMR_DefaultRole \
  --enable-debugging --name 'Kylin3 benchmark' \
  --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
  --region cn-northwest-1

Connect Master

ssh -ND 8157 -i ~/XiaoxiangYu.pem hadoop@ec2-XXX.cn-northwest-1.compute.amazonaws.com.cn
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server="socks5://localhost:8157" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp/

hit-lacus commented 3 years ago

Prepare Source

hit-lacus commented 3 years ago

Kylin Installation

Download Kylin

...

Modify .bashrc

export HIVE_HOME=/usr/lib/hive
export HADOOP_HOME=/usr/lib/hadoop
export HBASE_HOME=/usr/lib/hbase
export SPARK_HOME=/usr/lib/spark

export KYLIN_HOME=/home/hadoop/apache-kylin-3.1.2-bin-hbase1x
export HCAT_HOME=/usr/lib/hive-hcatalog
export KYLIN_CONF_HOME=$KYLIN_HOME/conf
export tomcat_root=$KYLIN_HOME/tomcat
export hive_dependency=$HIVE_HOME/conf:$HIVE_HOME/lib/:$HIVE_HOME/lib/hive-hcatalog-core.jar:$SPARK_HOME/jars/
export PATH=$KYLIN_HOME/bin:$PATH

Modify $KYLIN_HOME/bin/kylin.sh

add following line

export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX

remove following line

${KYLIN_HOME}/bin/check-migration-acl.sh || { exit 1; }

Modify $KYLIN_HOME/conf/kylin.properties

url, username and password can be found in /etc/hive/conf/hive-site.xml

## Use JDBC Metadata instead of HBase please ...
kylin.metadata.url=benchmark_kylin312@jdbc,url=jdbc:mysql://ip-172-31-4-51.cn-northwest-1.compute.internal:3306/hive,username=hive,password=5otQCktq8TLaw7T6,maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver

## Query Cache 
kylin.query.cache-enabled=false

## MR related
#kylin.engine.mr.config-override.mapreduce.map.java.opts=-Xmx6g
#kylin.engine.mr.config-override.mapreduce.map.memory.mb=7000
#kylin.engine.mr.config-override.mapreduce.reduce.java.opts=-Xmx10g
#kylin.engine.mr.config-override.mapreduce.reduce.memory.mb=11000

## Hive CLI related
kylin.source.hive.config-override.mapreduce.map.java.opts=-Xmx7g
kylin.source.hive.config-override.mapreduce.map.memory.mb=8192
kylin.source.hive.config-override.mapreduce.reduce.java.opts=-Xmx12g
kylin.source.hive.config-override.mapreduce.reduce.memory.mb=13000
kylin.source.hive.config-override.tez.task.resource.memory.mb=8192
kylin.source.hive.config-override.hive.tez.container.size=8192

## dfs.replication
kylin.engine.cuboid.dfs.replication=1

## HBase related
kylin.storage.hbase.region-cut-gb=3
kylin.storage.hbase.hfile-size-gb=1.5

Modify $KYLIN_HOME/conf/setenv.sh

More memory for query server please...

export KYLIN_JVM_SETTINGS="-Xms16G -Xmx16G ..."

Copy JDBC Driver

mkdir ext
cp /usr/lib/hive/lib/mariadb-connector-java.jar ext

Remove following jar under /usr/lib/hive/lib

sudo su
mv /usr/lib/hive/lib/jackson-datatype-joda-2.4.6.jar /home/hadoop
mv /usr/lib/hive/lib/apache-jsp-8.0.33.jar /home/hadoop
mv /usr/lib/hive/lib/apache-jsp-9.3.27.v20190418.jar /home/hadoop
mv /usr/lib/hive/lib/jetty-runner-9.3.27.v20190418.jar /home/hadoop
mv /usr/lib/hive/lib/websocket-common-9.3.27.v20190418.jar /home/hadoop
mv /usr/lib/hive/lib/websocket-server-9.3.27.v20190418.jar /home/hadoop
hit-lacus commented 3 years ago

Start Kylin Instance

hit-lacus commented 3 years ago

Cubing

hit-lacus commented 3 years ago

Check Storage

Source

[hadoop@ip-172-31-4-51 apache-jmeter-5.4.1]$ hadoop fs -du -h /user/hive/warehouse/tpch_flat_orc_10.db
80.9 M   /user/hive/warehouse/tpch_flat_orc_10.db/customer
1.6 G    /user/hive/warehouse/tpch_flat_orc_10.db/lineitem
2.8 K    /user/hive/warehouse/tpch_flat_orc_10.db/nation
383.1 M  /user/hive/warehouse/tpch_flat_orc_10.db/orders
45.2 M   /user/hive/warehouse/tpch_flat_orc_10.db/part
294.0 M  /user/hive/warehouse/tpch_flat_orc_10.db/partsupp
1.7 K    /user/hive/warehouse/tpch_flat_orc_10.db/region
5.0 M    /user/hive/warehouse/tpch_flat_orc_10.db/supplier

Build Duration

HDFS Disk Usage

hadoop fs -du -h /user/hbase/data
hadoop fs -du -h /kylin
hit-lacus commented 3 years ago

Response Time and QPS

wget jmeter zip

wget jmx file

- Start load test
```sh
bin/jmeter -n -t Kylin-benchmark-tpch.jmx -l result-tpch-10_1.jtl -j  result-tpch-10_1.log -e -o kylin4_tpch_thd_1_report/
hit-lacus commented 3 years ago

todo ...

hit-lacus commented 3 years ago

todo ...

hit-lacus commented 3 years ago

Reference