bssw-psip / ptc-catalog

PSIP - Progress Tracking Card (PTC) Catalog
https://bssw-psip.github.io/ptc-catalog
5 stars 15 forks source link

PTC: Continuous Technology Refreshment #8

Closed markcmiller86 closed 4 years ago

markcmiller86 commented 4 years ago

Continuous Technology Refreshment (CTR)

Effective management of technology transitions in software development infrastructure

The practice of Continuous Technology Refreshment (CTR) is defined as the periodic upgrade or replacement of infrastructure to deliver continued reliability, improved speed, capacity, and/or new features. The term is used primarily in the IT world when replacing obsolete hardware. However, long-lived software projects often wind up having to engage in equivalent activity. Examples of CTR in scientific software include...

Software projects often must adopt processes to manage the continually changing technologies upon which the project depends. Sometimes technology choices and transitions are driven by factors outside the control of the project. A project may be required to continue to support outdated technology or provide workarounds for years.

User Stories

As a software project manager, I want technology transitions that are well planned with understood impacts that minimize transient productivity degredations, maximize developer retraining efficiency as well as ensure long-term, overall improvements in project success.

Cards

  1. Identify and inventory the technologies upon which your project depends
    • Recognize the difference between needs (required) and wants (desired).
    • A needed technology is something the project simply cannot live without...the costs would be too great. A wanted technology is something the project can maybe get by with (at perehaps an additional cost).
    • Examples: Subversion, FrameMaker (docs), Python, MPI Version 2
  2. Adopt a practice of periodic (perhaps yearly) technology watch which includes assessing and writing down strenghts and weaknesses of technologies the project currently uses as well as identifying possible new technologies and what they have to offer in supporting your project goals.
  3. When the need for technology refreshes are identified, begin the process by engaging stakeholders in identifying impacts and collecting feedback
    • This can be helpful in informing not only how but when a refresh needs to take place
    • It might be useful to combine refreshes together to minize periods of instability
  4. For dramatic refreshes such as changes in revision control system, consider pilot projects involving a subset of the project or its members which include extensive testing of automated conversion processes
    • Sometimes a combination of automated and manual conversion is involved.
  5. Plan the refresh activity by setting a deadline, notifying stakeholders assigning resources and identifying intermediate phases of work.
  6. Document the new processes involved with using the new technology
  7. Accept that no matter how well planned and managed, there may be unforeseen instabilities. Ensure resources are on the ready for the transient period after completing a refresh.

Comments

This PTC is still a work in progress. I've listed the questions above to stimulate thinking of the competency sequence of CTR.

BarrySmith commented 4 years ago

Suggest to add after "which the project depends." In addition end users may often delay refreshing their own software infrastructure (or do it haphazardly or incorrectly) requiring the software developer to continue to support outdated technology or provide workarounds for years over which they have little control.

BarrySmith commented 4 years ago

A story.

The additions in the C++14 standard resulted in older system installs (standard libraries and or includes) (which had previously worked for more than a decade) not working with the -std=c++14 compiler options on newer compiler installs. Here is just one example

$ mpicxx -c -O3 -fPIC    -I/ccc/work/cont003/rndm/rndm/petsc/include -I/ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-complex-impi/include -I/ccc/products/mkl-19.0.5.075/system/default/19.0.5.075/mkl/include -I/ccc/products/mkl-19.0.5.075/system/default/19.0.5.075/mkl/lib/intel64/../../include    -MMD -MP /ccc/work/cont003/rndm/rndm/petsc/src/ksp/pc/impls/hpddm/hpddm.cxx -o arch-linux2-c-opt-complex-impi/obj/ksp/pc/impls/hpddm/hpddm.o -std=c++14
/ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-complex-impi/include/HPDDM_wrapper.hpp(112): error: calling the default constructor for "std::complex<double>" does not produce a constant value
      static constexpr K d__1 = std::is_floating_point<K>::value ? K(1.0) : 1;

I've seen a few others. Often the system files are beyond the control of the PETSc user and even explaining the problem is difficult since people are so used to just installing new compilers without updating the system. Having configure detect all these possible problems is asking too much so library compiles break and it takes several email iterations to get the user to resolve the problem at their end.

This is one of my current major install problems for packages that use C++.

markcmiller86 commented 4 years ago

Thanks @BarrySmith examples like this are great to have

cyrush commented 4 years ago

Here is a broad perspective:

Working on a large project (VisIt) with many core dependencies - weighing costs of API updates related to dependencies updates comes into focus quickly. I think this is b/c the required developer time for your product comes into focus.

However - beyond code changes related to API shifts -- there are many cases (without API changes) where updating core dependencies causes issues in your product.

Even for a team with decent automation for both building all of our dependencies and running tests - testing a wide range of combinations is not tractable given the size of the code base.

This is the primary reason why we try to pin versions of dependencies, upgrade intentionally, and vet those upgrades.

markcmiller86 commented 4 years ago

Thanks @cyrush. So, from your perspective you see most CTR activity as TPL dependency updates?

cyrush commented 4 years ago

@markcmiller86 I don't see it mostly as an endeavor in updating TPLs. This type of intentional approach is just one strategy and one component of CTR, but the strategy isn't as widely understood as I thought.

BarrySmith commented 4 years ago

Another fun one, Jed had to fix a couple of days ago: https://github.com/Parallel-NetCDF/PnetCDF/commit/38d210c006cabff70d78204d2db98a22ab87547c

The PETSc --download-pnetcdf had to be upgraded to the latest version because a while ago the OpenMPI project changed how (in a slightly odd way) it represented deprecated MPI 1 functions which broke PnetCDF configure code that previously had worked for many years.

I am sure the OpenMPI crew never conceived of their change breaking a package in this way nor the PnetCDF crew thinking their configure was not robust enough. The side effect is any third package (like PETSc) now has to keep in mind the relative releases of OpenMPI and Pnetcdf it installs. Outside changes can sometimes force projects to update their TPLs at random unplanned times.

BarrySmith commented 4 years ago

Mark, here's another possible basic bullet:

Changing programming models. For example, from MPI to MPI plus OpenMP or now MPI to MPI plus GPUs.

markcmiller86 commented 4 years ago

Changing programming models. For example, from MPI to MPI plus OpenMP or now MPI to MPI plus GPUs.

@BarrySmith that is a great one. I was trying to capture that in the (original) revision of the intro text where I mentioned performance portability methodologies and APIs. Maybe I should include a bullet on that as well?

markcmiller86 commented 4 years ago

the OpenMPI project changed how (in a slightly odd way) it represented deprecated MPI 1 functions

We ran into same or similar thing with HDF5 (@qkoziol, @epourmal or @brtnfld) can maybe correct me if wrong). My recollection was that most MPI implementations have yet to fully remove deprecated MPI 1 functions. OpenMPI version 4 was the first MPI implementation to do this so they are maybe getting a bit of a black eye over it. NNSA labs wound up working with THG to have HDF5 coding adjusted so it would work with both older and new MPI specs.

BarrySmith commented 4 years ago

Performance portability has become such a Washington buzz word it is not clear what it means anymore, that's why I prefer concrete examples (MPI + GPU for example) instead of grand abstractions :-)

Irrelevant side question, since Apple is moving completely to Metal for high performance graphics will Visit support/utilize Metal directly or just no longer support the Mac (or only support the Mac through third party add-ons that somehow still give one OpenGL).

BarrySmith commented 4 years ago

the OpenMPI project changed how (in a slightly odd way) it represented deprecated MPI 1 functions

We ran into same or similar thing with HDF5 (@qkoziol, @epourmal or @brtnfld) can maybe correct me if wrong). My recollection was that most MPI implementations have yet to fully remove deprecated MPI 1 functions. OpenMPI version 4 was the first MPI implementation to do this so they are maybe getting a bit of a black eye over it. NNSA labs wound up working with THG to have HDF5 coding adjusted so it would work with both older and new MPI specs.

Yeah, amazing how bad the community is at handling deprecations that are 10+ years old. The old joke that you can use any compiler so long as it is f77 is not so far off even though we have cranked up our dependencies enormously over the years.

We only recently went through PETSc to get rid of all uses of MPI 1 depreciated functions. Why did we wait so long? Well because we could :-) Until people really have to they tend to not change.

In PETSc we now use the nice compiler options to mark deprecated functions, so users can use them but they get messages each time they compiler their code about using deprecated PETSc functions. Will this make them update their code to get rid of the annoying messages? I don't know but it is one way to try to pull people forward. You don't want to force users to immediately change but somehow you want to provide incentives to have them change so that when you do drop the old support it doesn't break many users codes.

Any example is switching to more modern MPI interfaces for Fortran. A couple users want to use the MPI 2008 Fortran standard but it is difficult for use to simultaneously support that and the classic standard. 2008 is only 11 years ago :-) so why doesn't PETSc handle it properly :-(

markcmiller86 commented 4 years ago

nice compiler options to mark deprecated functions

Can you elaborate? Which option(s)? Which compilers? I have had to manually support this in libraries like Silo

BarrySmith commented 4 years ago
  def configureDeprecated(self):
    '''Check if __attribute((deprecated)) is supported'''
    self.pushLanguage(self.languages.clanguage)
    ## Recent versions of gcc and clang support __attribute((deprecated("string argument"))), which is very useful, but
    ## Intel has conspired to make a supremely environment-sensitive compiler.  The Intel compiler looks at the gcc
    ## executable in the environment to determine the language compatibility that it should attempt to emulate.  Some
    ## important Cray installations have built PETSc using the Intel compiler, but with a newer gcc module loaded (e.g.,
    ## 4.7).  Thus at PETSc configure time, the Intel compiler decides to support the string argument, but the gcc
    ## found in the default user environment is older and does not support the argument.  If GCC and Intel were cool
    ## like Clang and supported __has_attribute, we could avoid configure tests entirely, but they don't.  And that is
    ## why we can't have nice things.
    #
    # if self.checkCompile("""__attribute((deprecated("Why you shouldn't use myfunc"))) static int myfunc(void) { return 1;}""", ''):
    #   self.addDefine('DEPRECATED_FUNCTION(why)', '__attribute((deprecated(why)))')
    #   self.addDefine('DEPRECATED_TYPEDEF(why)', '__attribute((deprecated(why)))')
    if self.checkCompile("""__attribute((deprecated)) static int myfunc(void) { return 1;}""", ''):
      self.addDefine('DEPRECATED_FUNCTION(why)', '__attribute((deprecated))')
      self.addDefine('DEPRECATED_TYPEDEF(why)', '__attribute((deprecated))')
    else:
      self.addDefine('DEPRECATED_FUNCTION(why)', ' ')
      self.addDefine('DEPRECATED_TYPEDEF(why)', ' ')
    if self.checkCompile("""enum E {oldval __attribute((deprecated)), newval };""", ''):
      self.addDefine('DEPRECATED_ENUM(why)', '__attribute((deprecated))')
    else:
      self.addDefine('DEPRECATED_ENUM(why)', ' ')
    # I was unable to make a CPP macro that takes the old and new values as seperate arguments and builds the message needed by _Pragma
    # hence the deprecation message is handled as it is
    if self.checkCompile('#define TEST _Pragma("GCC warning \"Testing _Pragma\"") value'):
      self.addDefine('DEPRECATED_MACRO(why)', '_Pragma(why)')
    else:
      self.addDefine('DEPRECATED_MACRO(why)', ' ')
    self.popLanguage()

Example usage

#define   KSPNASH       "nash"
#define   KSPSTCG       "stcg"
#define   KSPGLTR       "gltr"
#define     KSPCGNASH  PETSC_DEPRECATED_MACRO("GCC warning \"KSPCGNASH macro is deprecated use KSPNASH (since version 3.11)\"")  "nash"
PETSC_DEPRECATED_FUNCTION("Use KSPCreateVecs() (since version 3.6)") PETSC_STATIC_INLINE PetscErrorCode KSPGetVecs(KSP ksp,PetscInt n,Vec **x,PetscInt m,Vec **y) {return KSPCreateVecs(ksp,n,x,m,y);}

Of course Jed is the one who suggested this approach.

markcmiller86 commented 4 years ago

And that is why we can't have nice things.

:laughing:

qkoziol commented 4 years ago

Yes, we’ve gone to fairly extraordinary lengths to allow HDF5 applications to continue to use previous versions of API routines and to migrate forward gracefully. Also - thanks for pointing out the ‘deprecated’ attribute! I’ll see if I can add that to the deprecated HDF5 routines and encourage users to move forward. :-)

Quincey

On Dec 7, 2019, at 4:33 PM, Barry Smith notifications@github.com wrote:

the OpenMPI project changed how (in a slightly odd way) it represented deprecated MPI 1 functions

We ran into same or similar thing with HDF5 (@qkoziol https://github.com/qkoziol, @epourmal https://github.com/epourmal or @brtnfld https://github.com/brtnfld) can maybe correct me if wrong). My recollection was that most MPI implementations have yet to fully remove deprecated MPI 1 functions. OpenMPI version 4 was the first MPI implementation to do this so they are maybe getting a bit of a black eye over it. NNSA labs wound up working with THG to have HDF5 coding adjusted so it would work with both older and new MPI specs.

Yeah, amazing how bad the community is at handling deprecations that are 10+ years old. The old joke that you can use any compiler so long as it is f77 is not so far off even though we have cranked up our dependencies enormously over the years.

We only recently went through PETSc to get rid of all uses of MPI 1 depreciated functions. Why did we wait so long? Well because we could :-) Until people really have to they tend to not change.

In PETSc we now use the nice compiler options to mark deprecated functions, so users can use them but they get messages each time they compiler their code about using deprecated PETSc functions. Will this make them update their code to get rid of the annoying messages? I don't know but it is one way to try to pull people forward. You don't want to force users to immediately change but somehow you want to provide incentives to have them change so that when you do drop the old support it doesn't break many users codes.

Any example is switching to more modern MPI interfaces for Fortran. A couple users want to use the MPI 2008 Fortran standard but it is difficult for use to simultaneously support that and the classic standard. 2008 is only 11 years ago :-) so why doesn't PETSc handle it properly :-(

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bssw-psip/ptc-catalog/issues/8?email_source=notifications&email_token=ABICDQTTNL5C3VSWONIMH2TQXQQDDA5CNFSM4JVCKW32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGGQ6EI#issuecomment-562892561, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABICDQUNTSTUDEDJAOHWI5LQXQQDDANCNFSM4JVCKW3Q.

bartlettroscoe commented 4 years ago

@markcmiller86,

I think this is great material and I agree with everything written. Every project needs to do this and take this seriously.

My only major concern is that this is not a Project Tracking Card (at least as I understand what a PTC is supposed to be or how it is to be used). What is written above is a process that could generate specific PTCs (e.g. to transition to Sphinx or transition to git) but in itself is not a PTC. For example, it makes no sense to say your project have gotten to level "2":

If you stop there, there is no increment of value to your project.

In fact, I would argue that there is no increment of value to your project until you complete all of the steps for a given technology transition. In fact, if you start a transition and don't complete it (like we have seen many projects that started down the road to adopting CMake and then abandoned it) then I would argue that this process has provided negative value to your project.

Therefore, this is not a PTC, it is a process, or a "How to" document.

markcmiller86 commented 4 years ago

@bartlettroscoe ... agreed. That is the current state of this issue. It is still a work in progress. I am still ruminating on it.

markcmiller86 commented 4 years ago

Irrelevant side question, since Apple is moving completely to Metal for high performance graphics will Visit support/utilize Metal directly or just no longer support the Mac (or only support the Mac through third party add-ons that somehow still give one OpenGL).

@BarrySmith ... in answer to this inquiry, one part of our strategy looks like it will include Vulcan. VisIt depends heavily on VTK and KitWare's current strategy appears to be to use Vulcan. However, VisIt does have some direct GL coding and some Qt+GL gymnastics that will be impacted as well and we've yet to consider all those issues.

markcmiller86 commented 4 years ago

I am going to close this for now while I consider how to re-write into one or more PTCs