elastic / ml-cpp

Machine learning C++ code
Other
7 stars 62 forks source link

[ML] Investigate replacing uses of bash and other Unix tools in CMake build system with platform independent code #2311

Open edsavage opened 2 years ago

edsavage commented 2 years ago

The initial approach to migrating to a CMake build system has continued to assume the presence of a Unix like build environment. On Windows this requires the presence of git bash and the minimal GNU toolset - MinGW. It is desirable to lessen the dependency on such tools on Windows as it lowers the bar for developers on that platform and simplifies the setting up of the build environment.

What follows is a list of places in the CMake build system that make calls to bash or other Unix commands.

lib/ver/CMakeLists.txt

lib/ver/CMakeLists.txt:execute_process(COMMAND date "+%Y" OUTPUT_VARIABLE BUILD_YEAR OUTPUT_STRIP_TRAILING_WHITESPACE)
lib/ver/CMakeLists.txt:execute_process(COMMAND id COMMAND awk -F ")" "{ print $1 }" COMMAND awk -F "(" "{ print $2 }" OUTPUT_VARIABLE USER_NAME
lib/ver/CMakeLists.txt:execute_process(COMMAND awk -F= "/elasticsearchVersion/ {gsub(/-.*/,\"\"); print $2}" $ENV{CPP_SRC_HOME}/gradle.properties OUTPUT_VARIABLE PRODUCT_VERSION OUTPUT_STRIP_TRAILING_WHITESPACE)

Suggestions:

3rd_party/CMakeLists.txt:

add_custom_target(
    eigen
    DEPENDS 3rd_party.sh pull-eigen.sh
    COMMAND bash -c "./3rd_party.sh --add ${INSTALL_DIR}"
    COMMAND bash -c "./pull-eigen.sh"
    WORKING_DIRECTORY cmake${CMAKE_CURRENT_SOURCE_DIR}
)

cmake/compiler/vs2019.cmake

execute_process(COMMAND bash -c "cygpath -m -s \"${ROOT}/Program Files (x86)/Microsoft Visual Studio/2019/Professional\"" OUTPUT_VARIABLE VCBASE OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cygpath -m -s \"${ROOT}/Program Files (x86)/Windows Kits\"" OUTPUT_VARIABLE WINSDKBASE OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cygpath -m -s \"${ROOT}/Program Files (x86)\"" OUTPUT_VARIABLE PFX86_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cd ${PFX86_DIR} && cygpath -m -s \"Microsoft Visual Studio\"" OUTPUT_VARIABLE MSVC_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "cd ${PFX86_DIR} && cygpath -m -s \"Windows Kits\"" OUTPUT_VARIABLE WIN_KITS_DIR OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "/bin/ls -1 ${PFX86_DIR}/${MSVC_DIR}/2019/Professional/VC/Tools/MSVC" COMMAND bash -c "tail -1" OUTPUT_VARIABLE VCVER OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND bash -c "/bin/ls -1 ${WINSDKBASE}/10/Include" COMMAND bash -c "tail -1" OUTPUT_VARIABLE UCRTVER OUTPUT_STRIP_TRAILING_WHITESPACE)

cmake/variables.cmake

execute_process(COMMAND awk -F= "/elasticsearchVersion/ {gsub(/-.*/,\"\"); print $2}"
  $ENV{CPP_SRC_HOME}/gradle.properties OUTPUT_VARIABLE ML_VERSION_NUM OUTPUT_STRIP_TRAILING_WHITESPACE)

cmake/functions.cmake

function(ml_generate_resources _target)
  if(NOT WIN32)
    return()
  endif()
  set( ${_target}_LINKFLAGS ${CMAKE_CURRENT_BINARY_DIR}/${_target}.res )
  set_target_properties( ${_target} PROPERTIES LINK_FLAGS ${${_target}_LINKFLAGS} )
  execute_process(COMMAND bash -c "${CMAKE_SOURCE_DIR}/mk/make_rc_defines.sh ${_target}.exe" OUTPUT_VARIABLE
    RC_DEFINES OUTPUT_STRIP_TRAILING_WHITESPACE)
  file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh "rc -nologo ${CPPFLAGS} ${RC_DEFINES} -Fo${_target}.res ${CMAKE_SOURCE_DIR}/mk/ml.rc")
  add_custom_target(
    ${_target}.res
    DEPENDS ${CMAKE_SOURCE_DIR}/mk/ml.rc ${CMAKE_SOURCE_DIR}/gradle.properties ${CMAKE_SOURCE_DIR}/mk/ml.ico ${CMAKE_SOURCE_DIR}/mk/make_rc_defines.sh ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh
    COMMAND bash -c ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh
    )
  add_dependencies(${_target} ${_target}.res)

  set_directory_properties(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh)
  set_directory_properties(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES ${CMAKE_CURRENT_BINARY_DIR}/${_target}.res)
endfunction()

build.gradle

grep -n bash build.gradle
83:// Always do the C++ build using bash (Git bash on Windows)
84:project.ext.bash = isWindows ? "C:\\Program Files\\Git\\bin\\bash" : "/bin/bash"
102:      commandLine bash
111:  commandLine bash
118:  commandLine bash
126:  commandLine bash
133:  commandLine bash
192:  commandLine bash
400:  commandLine bash
edsavage commented 2 years ago

Suggestions:

droberts195 commented 2 years ago
  • date: use native “date /T” (requires extensions to be installed and parse out the year

For the date we could use the TIMESTAMP feature of CMake's string function: https://cmake.org/cmake/help/v3.19/command/string.html#timestamp

If we needed accurate timestamps on each sub-compilation step this wouldn't be acceptable, as it gets the date when the build system is generated, not when it's run. However, we're only getting the year to put in the copyright message, and each CI build will generate the build system from scratch, so CMake's string and TIMESTAMP are adequate for our needs and have the advantage of being completely platform-independent.

edsavage commented 2 years ago

For the date we could use the TIMESTAMP feature of CMake's string function:

This looks ideal. Accurate timestamps aren't required, in fact we only require the year.

droberts195 commented 2 years ago
file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/tmp.sh "rc -nologo ${CPPFLAGS} ${RC_DEFINES} -Fo${_target}.res ${CMAKE_SOURCE_DIR}/mk/ml.rc")

I don't think we'd need to use a tmp.sh Bash script to run rc. rc is a native Windows program - the resource compiler for Windows programs. So surely we could just run it directly?

In fact, CMake might even provide a more elegant way to run rc. https://discourse.cmake.org/t/how-to-use-resource-files-rc-in-cmake/2628 alludes to it. This might be something to research for a future enhancement.

droberts195 commented 2 years ago
grep -n bash build.gradle
83:// Always do the C++ build using bash (Git bash on Windows)

The reason we're doing this is so that we can set up an environment using set_env.sh. It's nice that we can have one script for all platforms - if we had a .sh and a separate .bat then it's likely that people would sometimes forget to keep them in sync.

This might be the hardest of all to move away from.

However, I still think it's worth gradually getting rid of the other places where we're unnecessarily using Git Bash functionality from CMake. Once the lower level locations are dealt with we might find we don't need as many environment variables.

Another thing we could look at is setting up some of the paths that are currently in set_env.sh in CMakeLists.txt. It's probably going to be a long-term iterative process but we can keep taking small steps in the right direction.

droberts195 commented 2 years ago
  • 3rd_party.sh and pull-eigen.sh: rewrite using CMake scripting and call using cmake -P

I think this would be good. Certainly 3rd_party.sh seems to be getting run more often than it needs to be for iterative local rebuilds. Presumably we have it as a dependency for everything. This is true, but it still doesn't need to be run after every single minor code edit. It takes quite a long time on Linux where the PyTorch and MKL libraries are big.