TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

Path does not exist after renaming hive table #1488

Closed yw-yang closed 5 years ago

yw-yang commented 5 years ago

I encountered an issue when reading hive input:

  1. I create a hive table A
  2. Rename hive table A to B in hive - "alter table A rename to B"
  3. Create a smv hive input module and table name is B
  4. Create a smv module and input is the hive input module in step3
  5. Run this smv module and error "Path does not exist: hdfs:///user/hive/warehouse/db/A"

I checked create statement of table B and found SERDEPROPERTIES still points to A and LOCATION points to B.

CREATE TABLE `B`(
PARTITIONED BY (
  `partition_date` date)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
  'path'='hdfs:///user/hive/warehouse/db/A')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs:///user/hive/warehouse/db/B'

After I reset SERDEPROPERTIES path value, issue has been solved. ALTER TABLE table_name SET SERDEPROPERTIES ('path' = 'hdfs:///user/hive/warehouse/db/B');

Any ideas why it happens? Hive version - 1.1.0-cdh5.12.2.

guangningyu commented 5 years ago

It seems like a Hive issue.

jacobdr commented 5 years ago

Definitely a bit confusing and Hive version dependent.

https://issues.apache.org/jira/plugins/servlet/mobile#issue/HIVE-14909

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable On Nov 7, 2018, 01:09 -0800, Guangning Yu notifications@github.com, wrote:

It seems like a Hive issue. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

AliTajeldin commented 5 years ago

@yw-yang can this be closed? Doesn't seem to be an Eddie issue.

yw-yang commented 5 years ago

Yes, it's a spark/hive issue. I did some tests again and found this issue only happens when hive table is created by spark. Close now as not related to smv.

  1. Create a hive table in spark
    df.write.saveAsTable("test")
  2. Alter table name in hive
    alter table test rename to test_2;
  3. Query new table in spark, "path doesn't exist" error pops out.
    spark.sql("select * from test_2")