Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.84k stars 2.94k forks source link

UnsupportedFileSystemException No FileSystem for scheme ""alluxio"" #15083

Closed liangrui1988 closed 2 years ago

liangrui1988 commented 2 years ago

Alluxio Version: 2.7.2

Describe the bug "User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.UnsupportedFileSystemException No FileSystem for scheme ""alluxio"");; line -1 pos -1"

To Reproduce https://spark.apache.org/docs/2.4.0/configuration.html#custom-hadoophive-configuration val conf = new SparkConf().set("spark.hadoop.abc.def","xyz") val sc = new SparkContext(conf)

set spark.hadoop.fs.alluxio.impl=alluxio.hadoop.FileSystem; Why didn't this work?

`set spark.driver.extraClassPath=/data/sparkserver/spark_jars/sparkjars3/:/data/sparkserver/spark_jars/ranger-lib/:/data/sparkserver/spark_jars/atlas-lib/:/data/sparkserver/spark_jars/udf/hiido-uf-v011.jar:/data/sparkserver/spark_jars/udf/all.jar:/data/sparkserver/spark_jars/udf/hivemall-core-0.4.2-rc.2-with-dependencies.jar:/data/sparkserver/spark_jars/udf/hivemall-nlp-0.4.2-rc.2-with-dependencies.jar:/data/sparkserver/spark_jars/mainjar/hadoop-distcp-2.6.5.jar:/data/alluxio_client/alluxio-2.7.2-client.jar; set spark.executor.extraClassPath=/data/sparkserver/spark_jars/sparkjars3/:/data/sparkserver/spark_jars/ranger-lib/:/data/sparkserver/spark_jars/atlas-lib/:/data/sparkserver/spark_jars/udf/hiido-uf-v011.jar:/data/sparkserver/spark_jars/udf/all.jar:/data/sparkserver/spark_jars/udf/hivemall-core-0.4.2-rc.2-with-dependencies.jar:/data/sparkserver/spark_jars/udf/hivemall-nlp-0.4.2-rc.2-with-dependencies.jar:/data/sparkserver/spark_jars/mainjar/hadoop-distcp-2.6.5.jar:/data/alluxio_client/alluxio-2.7.2-client.jar; set spark.sql.hive.metastore.sharedPrefixes=com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,alluxio; set spark.hadoop.fs.alluxio.impl=alluxio.hadoop.FileSystem;

CREATE TABLE alluxio_u_user ( userid INT, age INT, gender CHAR(1), occupation STRING, zipcode STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'alluxio://fs-alluxio-master01.hiido.host.yydevops.com:19998,fs-alluxio-master02.hiido.host.yydevops.com:19998,fs-alluxio-master03.hiido.host.yydevops.com:19998/test/ml-100k';`

image

Logically, I could change this configuration alone, but it doesn't seem to be working.Because of the large scale of our cluster, if we change a single configuration, we need to change hadoop core-site. XML through Ambari and restart datanode, which is very costly and risky. I hope that only a single configuration of Spark can take effect. Is there any way to achieve this?

liangrui1988 commented 2 years ago

This configuration should be configured in metasotre service, the document is not clear and clear, I configured spark and Hadoop for a long time, but did not work, the result is to configure metaStore core-site. XML

HelloHorizon commented 2 years ago

Thanks @liangrui1988 for reporting! Do you mind helping update the doc based on your finding?

apc999 commented 2 years ago

Let's fix the doc @HelloHorizon and close this issue.

HelloHorizon commented 2 years ago

@liangrui1988 I am a little bit confuse which step make this happen and which document you refer to? If you are using hive metastore, we have setup doc here https://docs.alluxio.io/os/user/stable/en/compute/Hive.html