cloudera-labs / hive-sre

Apache License 2.0
17 stars 16 forks source link

Create Report that identifies "locations" in table defs that don't match `fs.defaultFS` #15

Closed dstreev closed 2 years ago

dstreev commented 2 years ago

This can happen when users have created tables in a non-ha hdfs environment and then went HA without running the metatool.

Should ignore non-hdfs namespaces.

dstreev commented 2 years ago

Testing with the following query:

SELECT DB_NAME,
       TBL_NAME,
       LOCATION,
       SERDES.SLIB AS SERDE_LIB
FROM (
         SELECT DBS.DB_ID,
                DBS.NAME AS DB_NAME,
                TBL_NAME,
                LOCATION,
                SERDE_ID
         FROM (
                  SELECT TBL_ID,
                         DB_ID,
                         TBL_NAME,
                         NONDEFAULTLOCS.SD_ID,
                         LOCATION,
                         SERDE_ID
                  FROM (
                           SELECT SD_ID,
                                  LOCATION,
                                  SERDE_ID
                           FROM (
                                    select *
                                    from SDS SDS
                                    WHERE SDS.LOCATION LIKE "hdfs://%"
                                ) hdfsLocs
                           WHERE hdfsLocs.LOCATION LIKE '${DEFAULT_NAMESPACE}%') NONDEFAULTLOCS
                           INNER JOIN TBLS TBLS ON NONDEFAULTLOCS.SD_ID = TBLS.SD_ID) NONDEFAULTTBL
                  INNER JOIN DBS DBS ON NONDEFAULTTBL.DB_ID = DBS.DB_ID) NONDEFAULT
         INNER JOIN SERDES SERDES ON NONDEFAULT.SERDE_ID = SERDES.SERDE_ID;
dstreev commented 2 years ago

Closing this issue because this is something that can be done from within Cloudera Manager. In CM, go to the Hive Service. From the "Action" menu select "Update Hive Metastore Namenodes".