Closed oliverholworthy closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
This is the error that is printed out in gpu-ci
. Looks like something to do with the horovod init. I wonder if we should be the horovod init side-effect (from the merlin.models.tf import)
--------------------------------------------------------------------------
Sorry! You were supposed to get help about:
mpi_init:startup:internal-failure
But I couldn't open the help file:
/build-result/hpcx-v2.13-gcc-inbox-ubuntu20.04-cuda11-gdrcopy2-nccl2.12-x86_64/ompi/share/openmpi/help-mpi-runtime.txt: No such file or directory. Sorry!
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[6bbe81b6bdba:01522] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Goals :soccer:
Fix use of Categorify to support new version of NVTabular
Implementation Details :construction:
start_index
was removed fromCategorify
in https://github.com/NVIDIA-Merlin/NVTabular/pull/1692