Closed fushaofeng closed 1 year ago
The err msg “No source file in this table” means starrocks has listed the dir, and doesn't find a file. Since the list hdfs operation is successful, i think starrocks' broker load can support Aliccloud oss-hdfs.
Can you recheck whether your path is correct? @fushaofeng
If you still have the problems after checking the path, you can paste your broker's log here, which may be more helpful for problem tracing.
1)The file path is correct [hdfs@n152 ~]$ hdfs dfs -du -h oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/ 23/01/29 11:07:46 INFO jnative.NativeLogger: CommonJniLogging.cpp:34] Client logger level is info 23/01/29 11:07:46 INFO jnative.NativeLogger: JindoJniState.cpp:7] JindoJniState::prepareAfterInited() getting called 23/01/29 11:07:46 INFO jnative.NativeLogger: JcomMainBaseImpl.hpp:130] main doRun() return value True 23/01/29 11:07:46 INFO jnative.NativeLogger: JindoNative.cpp:42] Successfully initialized jni native 23/01/29 11:07:46 INFO common.JindoHadoopSystem: Initialized native file system: 23/01/29 11:07:46 INFO common.FsStats: cmd=getFileStatus, src=oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher, dst=null, size=0, parameter=null, time-in-ms=22, version=4.6.2 23/01/29 11:07:46 INFO common.FsStats: cmd=IterativeList, src=oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher, dst=null, size=0, parameter=null, time-in-ms=5, version=4.6.2 23/01/29 11:07:46 INFO common.FsStats: cmd=getContentSummary, src=oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/000000_0, dst=null, size=0, parameter=null, time-in-ms=3, version=4.6.2 13.0 M 0 oss://bucket_oss.cn-beijing.oss-dls.aliyuncs.com/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/000000_0
2) apache_hdfs_broker.out Exception in thread "TThreadPoolServer WorkerProcess-%d" java.lang.NoClassDefFoundError: com/aliyun/jindodata/api/spec/JdoException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2625) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2590) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537) at com.starrocks.broker.hdfs.FileSystemManager.getOSSFileSystem(FileSystemManager.java:652) at com.starrocks.broker.hdfs.FileSystemManager.getFileSystem(FileSystemManager.java:202) at com.starrocks.broker.hdfs.FileSystemManager.listPath(FileSystemManager.java:717) at com.starrocks.broker.hdfs.HDFSBrokerServiceImpl.listPath(HDFSBrokerServiceImpl.java:74) at com.starrocks.thrift.TFileBrokerService$Processor$listPath.getResult(TFileBrokerService.java:815) at com.starrocks.thrift.TFileBrokerService$Processor$listPath.getResult(TFileBrokerService.java:795) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.ClassNotFoundException: com.aliyun.jindodata.api.spec.JdoException at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 20 more
@fushaofeng Thanks for your feedback! This problem is introduced by our oss-library on broker is too old. You can try replacing the hadoop-aliyun-sdk to new jindosdk-4.6.2 in broker, fe and be. the library path is in broker/lib, fe/lib/, be/lib. JindoSDK path https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/4.6.2/jindosdk-4.6.2.tar.gz, you can refer the following pr on how to. replace the library. https://github.com/StarRocks/starrocks/pull/16935 https://github.com/StarRocks/starrocks/pull/15606
If the broker load still not working, let me know.
Thank you for your answer! The problem has been resolved after replacing the new version of jindo SDK.
The steps are as follows: 1)Put the new versions of jindosdk-4.6.2 and jindo-core-4.6.2.jar in broker the library path is in broker/lib 2)Restart broker service 3)Create Load Job LOAD LABEL tmp_fsf_dwd_tch_teacher (DATA INFILE("oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/000000_0") INTO TABLE tmp_fsf_dwd_tch_teacher format as orc ( id, employee_number, name, english_name, sex) ) WITH BROKER broker1 ( "fs.oss.accessKeyId" = "xxx", "fs.oss.accessKeySecret" = "xxx", "fs.oss.endpoint" = "cn-beijing.oss-dls.aliyuncs.com" ); 4)show load
i also recommend you also update the oss jar jindosdk-4.6.2 and jindo-core-4.6.2.jar to in FE and BE
Ok, thank you!
Steps to reproduce the behavior (Required)
1.StarRocks broker load does not support Aliccloud oss-hdfs 2.Create a broker load task LOAD LABEL tmp_fsf_dwd_tch_teacher (DATA INFILE("oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/*") INTO TABLE tmp_fsf_dwd_tch_teacher format as orc (_col0,_col1,_col2,_col3,_col4) SET ( id = _col0, employee_number = _col1, name = _col2, english_name = _col3, sex = _col4)) WITH BROKER broker1 ("fs.oss.accessKeyId" = "xxx", "fs.oss.accessKeySecret" = "xxx", "fs.oss.endpoint" = "cn-beijing.oss-dls.aliyuncs.com" );
Expected behavior (Required)
Real behavior (Required)
1.MySQL [dbadb]> show load \G JobId: 14028 Label: tmp_load State: CANCELLED Progress: ETL:N/A; LOAD:N/A Type: BROKER EtlInfo: NULL TaskInfo: resource:N/A; timeout(s):14400; max_filter_ratio:0.0 ErrorMsg: type:ETL_RUN_FAIL; msg:No source file in this table(tmp_fsf_dwd_tch_teacher).
2.Task error message 2022-12-22 16:31:02 [ TThreadPoolServer WorkerProcess-%d:3632577521 ] - [ INFO ] received a list path request, request detail: TBrokerListPathRequest(version:VERSION_ONE, path:oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/, isRecursive:false, properties:{fs.oss.accessKeyId=xxx, fs.oss.accessKeySecret=xxx, fs.oss.endpoint=cn-beijing.oss-dls.aliyuncs.com}) 2022-12-22 16:31:02 [ TThreadPoolServer WorkerProcess-%d:3632577787 ] - [ INFO ] could not find file system for path oss://bucket_oss/user/hive/warehouse/fdp/finance_dwd.db/tmp_fsf_dwd_tch_teacher/ create a new one 2022-12-22 16:31:02 [ TThreadPoolServer WorkerProcess-%d:3632577960 ] - [ WARN ] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
StarRocks version (Required)
select current_version()
MySQL [dbadb]> select current_version(); +-------------------+ | current_version() | +-------------------+ | 2.3.3 164799c | +-------------------+