databrickslabs / overwatch

Capture deep metrics on one or all assets within a Databricks workspace
Other
230 stars 64 forks source link

UPGRADE FAILED function structToMap on 7.0 #582

Closed alinealvarez0107 closed 1 year ago

alinealvarez0107 commented 2 years ago

When trying to upgrade to version 7.0, I got this failure in step 3 of the command val upgradeReport = Upgrade.upgradeTo0700(prodWorkspace, startStep = 1).

View the statusMsg: UPGRADE FAILED function structToMap, columnToConvert must be of type struct but found map instead java.lang.Exception: function structToMap, columnToConvert must be of type struct but found map instead at com.databricks.labs.overwatch.utils.SchemaTools$.structToMap(SchemaTools.scala:253) at com.databricks.labs.overwatch.utils.Upgrade$.upgradeTo0700(Upgrade.scala:1251) ... at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:221) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:225) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:1069) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:1022) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:225) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:689) at com.databricks.unity.UCSDriver$Manager$Handle.runWith(UCSDriver.scala:104) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$19(DriverLocal.scala:689) at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:32) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94) at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:30) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:283) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:282) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:60) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:318) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:303) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:60) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:666) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:622) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:614) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:533) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:568) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:438) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:381) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:232) at java.lang.Thread.run(Thread.java:748)

What should I do to fix this error and proceed with the upgrade?

falha_upgrade7 0

GeekSheikh commented 2 years ago

what version are you upgrading from? Is this the first time you're running the upgrade or did you previously run the upgrade? It looks like this table has already been upgraded, please advise.

alinealvarez0107 commented 2 years ago

I was using version 6.0 and had to upgrade to version 6.1 yesterday to be able to run 7.0.

AnushaSure commented 2 years ago

Hi @alinealvarez0107,

Which cloud are you using (AWS/Azure) ? Upgrade to version 6.1 was successful ?

Regards, Anusha

alinealvarez0107 commented 2 years ago

Hi, @AnushaSure

Yes, it was! We're using Azure.

Could you help me please?

I need to finish this upgrade :/

Neha-vs123 commented 2 years ago

Hi @alinealvarez0107,

While moving from 611 to 070, what cluster library you used? Is it 0700 or 07001 JAR? Please let me know your time zone. We are from the IST. If possible we can have a call to analyze the issue better.

Regards, Neha

alinealvarez0107 commented 2 years ago

I used 0700 in brazil location.

Neha-vs123 commented 1 year ago

@alinealvarez0107 , Please let me know when can we meet today. It would be helpful if we can meet sometime between 9 to 11 am BRT (your time). Please send me your mail id to neha.vs@databricks.com, so that I will schedule a meeting.

alinealvarez0107 commented 1 year ago

Hello @Neha-vs123 ,

I just sent the meeting invitation to you from 10:30 am to 11:00 am Brazil time.

Thank you.

Neha-vs123 commented 1 year ago

Hi @alinealvarez0107 ,

Thanks for sharing the notebook. As we discussed in the call, I'll find a solution and get back to you on Monday.

Regards, Neha

GeekSheikh commented 1 year ago

could you please try resuming the upgrade from step 4. We're investigating how some customers already have the correct type before the upgrade but since the target type is already a map, no need to complete this step.

val upgradeReport = Upgrade.upgradeTo0700(prodWorkspace, startStep = 4)
Neha-vs123 commented 1 year ago

@GeekSheikh , we finished the upgrade by resuming from step 4. It was completed successfully from steps 4 to 7. In the final pip report, there are failures in Bronze_Jobs_Snapshot (step 3 error) and Bronze_SparkEventLogs.

@alinealvarez0107 , cluster library 7.0.0.2 JAR is released in Maven. Please change the JAR to 7002 and run the jobs. Please send the pipeline report after finishing the job run using 7002 JAR. Use the command - select * from overwatch_etl.pipeline_report

Neha-vs123 commented 1 year ago

Hi @alinealvarez0107 , Are you still facing issues after using the 7.0.0.2 JAR?

alinealvarez0107 commented 1 year ago

Hi @Neha-vs123!

I still have the same problem with the data types of some tables.

overwatch_jar_7 0 0 2

Neha-vs123 commented 1 year ago

@alinealvarez0107 , Any other failed modules other than Bronze_SparkEventLogs? Please send the pipeline_report from the ETL database.

Please check if the event hub streaming is working fine. You can also check it by running the readiness notebook in this workspace.

alinealvarez0107 commented 1 year ago

Hi @Neha-vs123 I sent you the pipeline_report by email. The event hub streaming is working fine.

GeekSheikh commented 1 year ago

This is because your upgrade to 0610 did not complete successfully. 0610 Upgrade step 3 was to convert this field to a map. Please also ensure that you are using DBR 10.4LTS on your overwatch cluster. After you've ensured the OW cluster is on 10.4LTS please run the commands below in a notebook. The commands below are the commands that did not complete as part of the 0610 upgrade (I'm guessing you might have forgotten to change you cluster to 10.4LTS maybe).

import org.apache.log4j.{Level, Logger}

val logger: Logger = Logger.getLogger("UPGRADE_RETRY")
val etlDatabaseName = "overwatch_etl" // CHANGE ME IF NECESSARY
val targetName = "spark_events_bronze"

spark.conf.set("spark.databricks.delta.optimizeWrite.numShuffleBlocks", "500000")
spark.conf.set("spark.databricks.delta.optimizeWrite.binSize", "2048")
spark.conf.set("spark.sql.files.maxPartitionBytes", (1024 * 1024 * 64).toString)
spark.conf.set("spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite", "true")

val sparkEventsBronzeDF = spark.table(s"${etlDatabaseName}.${targetName}")
val sparkEventsSchema = sparkEventsBronzeDF.schema
val fieldsRequiringRebuild = Array("modifiedConfigs", "extraTags")

def upgradeDeltaTable(qualifiedName: String): Unit = {
  try {
    val tblPropertiesUpgradeStmt =
      s"""ALTER TABLE $qualifiedName SET TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5',
    'delta.columnMapping.mode' = 'name'
  )
  """
    logger.info(s"UPGRADE STATEMENT for $qualifiedName: $tblPropertiesUpgradeStmt")
    spark.sql(tblPropertiesUpgradeStmt)
  } catch {
    case e: Throwable =>
      logger.error(s"FAILED $qualifiedName ->", e)
      println(s"FAILED UPGRADE FOR $qualifiedName")
  }
}

upgradeDeltaTable(s"${etlDatabaseName}.${targetName}")

val fieldsToRename = sparkEventsSchema.fieldNames.filter(f => fieldsRequiringRebuild.contains(f))
fieldsToRename.foreach(f => {
  val modifyColStmt = s"alter table ${etlDatabaseName}.${targetName} rename " +
    s"column $f to ${f}_tobedeleted"
  logger.info(s"Beginning $targetName upgrade\nSTMT1: $modifyColStmt")
  spark.sql(modifyColStmt)
})
Neha-vs123 commented 1 year ago

Thanks, @GeekSheikh for the detailed explanation.

@alinealvarez0107 , please follow all the steps given above and re-run the job with the same cluster. Make sure the cluster DBR is 10.4 LTS. Let me know if have any doubts.

alinealvarez0107 commented 1 year ago

Hello @GeekSheikh and @Neha-vs123 ,

I was using DBR 11.1. I switched to 10.4LTS, copied the commands and got the import failed to execute.

I changed the import command and I'm trying to run it.

GeekSheikh commented 1 year ago

ok, sorry I wasn't able to test this internally -- apologies if there's a syntax bug there. Please let us know status and we'll get on a screen share to get this sorted if necessary.

alinealvarez0107 commented 1 year ago

Hi @GeekSheikh don´t worry! ;)

I corrected the import and after executing the command you sent me, the overwatch notebook completed successfully and the table that was at fault was updated correctly.

I'm applying the fix to the other workspaces and will let you know if any issues arise.

result 11092022

Neha-vs123 commented 1 year ago

Hi @alinealvarez0107 , Great! Thanks for the update. Closing this ticket for now. Feel free to reopen this when you need any assistance.

Regards, Neha