StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.91k stars 1.79k forks source link

Broker load does not support Aliccloud oss-hdfs #15684

Closed fushaofeng closed 1 year ago

fushaofeng commented 1 year ago

Steps to reproduce the behavior (Required)

1.StarRocks broker load does not support Aliccloud oss-hdfs 2.Create a broker load task LOAD LABEL tmp_fsf_dwd_tch_teacher (DATA INFILE("oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/*") INTO TABLE tmp_fsf_dwd_tch_teacher format as orc (_col0,_col1,_col2,_col3,_col4) SET ( id = _col0, employee_number = _col1, name = _col2, english_name = _col3, sex = _col4)) WITH BROKER broker1 ("fs.oss.accessKeyId" = "xxx", "fs.oss.accessKeySecret" = "xxx", "fs.oss.endpoint" = "cn-beijing.oss-dls.aliyuncs.com" );

Expected behavior (Required)

Real behavior (Required)

1.MySQL [dbadb]> show load \G JobId: 14028 Label: tmp_load State: CANCELLED Progress: ETL:N/A; LOAD:N/A Type: BROKER EtlInfo: NULL TaskInfo: resource:N/A; timeout(s):14400; max_filter_ratio:0.0 ErrorMsg: type:ETL_RUN_FAIL; msg:No source file in this table(tmp_fsf_dwd_tch_teacher).

2.Task error message 2022-12-22 16:31:02 [ TThreadPoolServer WorkerProcess-%d:3632577521 ] - [ INFO ] received a list path request, request detail: TBrokerListPathRequest(version:VERSION_ONE, path:oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/, isRecursive:false, properties:{fs.oss.accessKeyId=xxx, fs.oss.accessKeySecret=xxx, fs.oss.endpoint=cn-beijing.oss-dls.aliyuncs.com}) 2022-12-22 16:31:02 [ TThreadPoolServer WorkerProcess-%d:3632577787 ] - [ INFO ] could not find file system for path oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/ create a new one 2022-12-22 16:31:02 [ TThreadPoolServer WorkerProcess-%d:3632577960 ] - [ WARN ] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

StarRocks version (Required)

xiaoyong-z commented 1 year ago

The err msg “No source file in this table” means starrocks has listed the dir, and doesn't find a file. Since the list hdfs operation is successful, i think starrocks' broker load can support Aliccloud oss-hdfs.

xiaoyong-z commented 1 year ago

Can you recheck whether your path is correct? @fushaofeng

xiaoyong-z commented 1 year ago

If you still have the problems after checking the path, you can paste your broker's log here, which may be more helpful for problem tracing.

fushaofeng commented 1 year ago

1)The file path is correct [hdfs@n152 ~]$ hdfs dfs -du -h oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/ 23/01/29 11:07:46 INFO jnative.NativeLogger: CommonJniLogging.cpp:34] Client logger level is info 23/01/29 11:07:46 INFO jnative.NativeLogger: JindoJniState.cpp:7] JindoJniState::prepareAfterInited() getting called 23/01/29 11:07:46 INFO jnative.NativeLogger: JcomMainBaseImpl.hpp:130] main doRun() return value True 23/01/29 11:07:46 INFO jnative.NativeLogger: JindoNative.cpp:42] Successfully initialized jni native 23/01/29 11:07:46 INFO common.JindoHadoopSystem: Initialized native file system: 23/01/29 11:07:46 INFO common.FsStats: cmd=getFileStatus, src=oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher, dst=null, size=0, parameter=null, time-in-ms=22, version=4.6.2 23/01/29 11:07:46 INFO common.FsStats: cmd=IterativeList, src=oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher, dst=null, size=0, parameter=null, time-in-ms=5, version=4.6.2 23/01/29 11:07:46 INFO common.FsStats: cmd=getContentSummary, src=oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/000000_0, dst=null, size=0, parameter=null, time-in-ms=3, version=4.6.2 13.0 M 0 oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/000000_0

2) apache_hdfs_broker.out Exception in thread "TThreadPoolServer WorkerProcess-%d" java.lang.NoClassDefFoundError: com/aliyun/jindodata/api/spec/JdoException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2625) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2590) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537) at com.starrocks.broker.hdfs.FileSystemManager.getOSSFileSystem(FileSystemManager.java:652) at com.starrocks.broker.hdfs.FileSystemManager.getFileSystem(FileSystemManager.java:202) at com.starrocks.broker.hdfs.FileSystemManager.listPath(FileSystemManager.java:717) at com.starrocks.broker.hdfs.HDFSBrokerServiceImpl.listPath(HDFSBrokerServiceImpl.java:74) at com.starrocks.thrift.TFileBrokerService$Processor$listPath.getResult(TFileBrokerService.java:815) at com.starrocks.thrift.TFileBrokerService$Processor$listPath.getResult(TFileBrokerService.java:795) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.ClassNotFoundException: com.aliyun.jindodata.api.spec.JdoException at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 20 more

xiaoyong-z commented 1 year ago

@fushaofeng Thanks for your feedback! This problem is introduced by our oss-library on broker is too old. You can try replacing the hadoop-aliyun-sdk to new jindosdk-4.6.2 in broker, fe and be. the library path is in broker/lib, fe/lib/, be/lib. JindoSDK path https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/4.6.2/jindosdk-4.6.2.tar.gz, you can refer the following pr on how to. replace the library. https://github.com/StarRocks/starrocks/pull/16935 https://github.com/StarRocks/starrocks/pull/15606

xiaoyong-z commented 1 year ago

If the broker load still not working, let me know.

fushaofeng commented 1 year ago

Thank you for your answer! The problem has been resolved after replacing the new version of jindo SDK.

The steps are as follows: 1)Put the new versions of jindosdk-4.6.2 and jindo-core-4.6.2.jar in broker the library path is in broker/lib 2)Restart broker service 3)Create Load Job LOAD LABEL tmp_fsf_dwd_tch_teacher (DATA INFILE("oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/000000_0") INTO TABLE tmp_fsf_dwd_tch_teacher format as orc ( id, employee_number, name, english_name, sex) ) WITH BROKER broker1 ( "fs.oss.accessKeyId" = "xxx", "fs.oss.accessKeySecret" = "xxx", "fs.oss.endpoint" = "cn-beijing.oss-dls.aliyuncs.com" ); 4)show load

image
xiaoyong-z commented 1 year ago

i also recommend you also update the oss jar jindosdk-4.6.2 and jindo-core-4.6.2.jar to in FE and BE

fushaofeng commented 1 year ago

Ok, thank you!