StanfordAHA / garnet

Next generation CGRA generator
BSD 3-Clause "New" or "Revised" License
106 stars 11 forks source link

CAD problems related to recent r7->r8 upgrade. #1085

Open steveri opened 3 months ago

steveri commented 3 months ago

@norabarlow @mbstrange2 (also please share with others who might be interested/impacted)

Many of our CAD machines recently upgraded from Red Hat version 7 (r7) to Rocky Linux version 8 (r8).

This broke some stuff.

In particular, the weekly Amber full-chip builds that run on r8arm-aha (formerly r7arm-aha) needed some tweaking to make them work.

This issue idocuments changes I made to get Amber full-chip builds working again, changes that will be included in an upcoming Garnet pull request. Because it happened in the context of the Garnet repo, I am filing the issue here, although the implications clearly reach far beyond...

  1. Calibre tools (e.g. I use calibre/2019.1) installs a default tclsh that throws errors if it sees an OS other than "centos", "rhel", or "sles". Once tclsh is poisoned, nothing tcl-related works, including the "module" command:
    % source /cad/modules/tcl/init/sh
    % module load base/1.0
    % type tclsh    # "tclsh is /usr/bin/tclsh"
    % module load calibre/2019.1
    % type tclsh    # "tclsh is /cad/mentor/2019.1/aoi_cal_2019.1_18.11/bin/tclsh"
    % tclsh         # "Invalid operating system environment"
    % module --help
    Invalid operating system environment, VENDOR=unknown OS VERSION=8

    Similarly for newer calibre versions e.g. 2021.2_18:

    % source /cad/modules/tcl/init/sh
    % module load base/1.0
    % type tclsh    # "tclsh is /usr/bin/tclsh"
    % module load calibre/2021.2_18
    % which tclsh   # "tclsh is /cad/mentor/2021.4/aoi_cal_2021.2_18.11/bin/tclsh"
    % tclsh         # "Invalid operating system environment"
    % module --help
    Invalid operating system environment, VENDOR=unknown OS VERSION=8

    At least two possible solutions:

    • reset tclsh to a known good version e.g. something like PATH=/usr/bin:$PATH
    • un-poison the Calibre tclsh by providing it with something that avoids the "Invalid OS" error. One way to do this is by setting an environment variable USE_CALIBRE_VCO e.g.
      test -e /etc/os-release && source /etc/os-release  # Sets os-related vars including ID
      [ "$ID" == "rocky" ] && export USE_CALIBRE_VCO=aoi

I use this second approach in my setup script. It works because the "Invalid OS" error originates in one of the many scripts called by Calibre-tclsh on startup, and that script is satisfied when it sees the preset "USE_CALIBRE_VCO" variable.

  % cat /cad/mentor/2019.1/aoi_cal_2019.1_18.11/bin/calibre_vco
  if test -n "$USE_CALIBRE_VCO"; then VCO=$USE_CALIBRE_VCO
  elif test \( "$OS_VENDOR" = redhat -a "$OS_MAJOR_REV" -lt 5 \) \
  then VCO=something useful
  else error_exit 'Invalid Linux operating system'
  1. Some of the Cadence tools failed after the upgrade because, for similar reasons to those listed above I guess, a crucial OA_HOME variable ends up pointing to the wrong place, resulting in some kind of "wrong oa version" error.

    innovus:    INFO: No proper OA2.2 installation found. The OA2.2 features of innovus will be disabled.
    
    **ERROR: (IMPOAX-142): Could not open shared library libinnovusoax22.so :
    /cad/cadence/INNOVUS19.10.000.lnx86/tools.lnx86/lib/64bit/libddbase_sh.so:
    undefined symbol: _ZN8oaCommon11FactoryBase11getRefCountEv

    I was able to fix this by finding the missing library and pointing to it by way of the "OA_HOME" env var. And also something with OA_UNSUPPORTED_PLAT I dunno:

    test -e /etc/os-release && source /etc/os-release  # Sets os-related vars including ID
    [ "$ID" == "rocky" ] && unset OA_UNSUPPORTED_PLAT
    [ "$ID" == "rocky" ] && OA_HOME=/cad/cadence/ICADVM20.10.330/oa_v22.60.090