JuliaGPU / AMGX.jl

MIT License
11 stars 4 forks source link

Warning: `NVBLAS_CONFIG_FILE environment variable is NOT set` #23

Open tomchor opened 1 year ago

tomchor commented 1 year ago

Every time AMGX gets called I see the following errors:

julia> using AMGX
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided

I googled it but couldn't make much sense of what this means. Does this require any action on my end or can I just safely ignore it?

navidcy commented 1 year ago

Perhaps on a similar note, is there a way as a developer to hide this [NVBLAS]-related info from all users?

glwagner commented 1 year ago

Could be relevant: https://forums.developer.nvidia.com/t/having-trouble-with-nvblas/32766

glwagner commented 1 year ago

I suspect the first three errors can be dealt with by

  1. Checking if the environment variable NVBLAS_CONFIG_FILE is set
  2. If not, looking somewhere for nvblas.conf (like the current working directory)
  3. If nvblas.conf is not found, invoking some reasonable default like the one in the docs (pasted below) by copying the default to file and then setting NVBLAS_CONFIG_FILE to point there

The final puzzle is how to find the CPU Blas library, and point to that in the default config file. I don't know how to do that.

The "typical" config file from the docs (seems like it needs interpretation):

#Copyright 2013 NVIDIA Corporation. All rights reserved.
# This is the configuration file to use NVBLAS Library
# Setup the environment variable NVBLAS_CONFIG_FILE to specify your own config
 file.
# By default, if NVBLAS_CONFIG_FILE is not defined,
# NVBLAS Library will try to open the file "nvblas.conf" in its current
 directory
# Example : NVBLAS_CONFIG_FILE /home/cuda_user/my_nvblas.conf
# Specify which output log file (default is stderr)
NVBLAS_LOGFILE nvblas.log
#Put here the CPU BLAS fallback Library of your choice
NVBLAS_CPU_BLAS_LIB libopenblas.so
#NVBLAS_CPU_BLAS_LIB libmkl_rt.so
# List of GPU devices Id to participate to the computation
# Use ALL if you want all your GPUs to contribute
# Use ALL0, if you want all your GPUs of the same type as device 0 to contribute
# However, NVBLAS consider that all GPU have the same performance and PCI
 bandwidth
# By default if no GPU are listed, only device 0 will be used
#NVBLAS_GPU_LIST 0 2 4
#NVBLAS_GPU_LIST ALL
NVBLAS_GPU_LIST ALL0
# Tile Dimension
NVBLAS_TILE_DIM 2048
# Autopin Memory
NVBLAS_AUTOPIN_MEM_ENABLED
#List of BLAS routines that are prevented from running on GPU (use for debugging
 purpose
# The current list of BLAS routines supported by NVBLAS are
# GEMM, SYRK, HERK, TRSM, SYMM, HEMM, SYR2K, HER2K,
#NVBLAS_GPU_DISABLED_SGEMM
#NVBLAS_GPU_DISABLED_DGEMM
#NVBLAS_GPU_DISABLED_CGEMM
#NVBLAS_GPU_DISABLED_ZGEMM
# Computation can be optionally hybridized between CPU and GPU
# By default, GPU-supported BLAS routines are ran fully on GPU
# The option NVBLAS_CPU_RATIO_<BLAS_ROUTINE> give the ratio [0,1]
# of the amount of computation that should be done on CPU
# CAUTION : this option should be used wisely because it can actually
# significantly reduced the overall performance if too much work is given to CPU
#NVBLAS_CPU_RATIO_CGEMM 0.07
navidcy commented 1 year ago

indeed with a default.conf as above then:

$ export NVBLAS_CONFIG_FILE=default.conf

$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.4 (2022-12-23)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using AMGX
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to 'default.conf'

However, when I exit Julia I get flooded with:

julia> exit()

signal (11): Segmentation fault
in expression starting at REPL[2]:1
fflush at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f6b811090bb)
unknown function (ip: 0x7f6b8110246e)
unknown function (ip: 0x7f6b8116e3f0)
__run_exit_handlers at /lib64/libc.so.6 (unknown line)
exit at /lib64/libc.so.6 (unknown line)
ijl_exit at /g/data/v45/nc3020/julia-1.8/src/jl_uv.c:641
exit at ./initdefs.jl:28 [inlined]
exit at ./initdefs.jl:29
jfptr_exit_48098 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
jl_apply at /g/data/v45/nc3020/julia-1.8/src/julia.h:1843 [inlined]
do_call at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:126
eval_value at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:215
eval_stmt_value at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:166 [inlined]
eval_body at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:594
jl_interpret_toplevel_thunk at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:750
jl_toplevel_eval_flex at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:906
jl_toplevel_eval_flex at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:850
eval_body at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:556
eval_body at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:522
jl_interpret_toplevel_thunk at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:750
jl_toplevel_eval_flex at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:906
ijl_toplevel_eval_in at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_64854 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
#967 at ./client.jl:419
jfptr_YY.967_49733 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
jl_apply at /g/data/v45/nc3020/julia-1.8/src/julia.h:1843 [inlined]
jl_f__call_latest at /g/data/v45/nc3020/julia-1.8/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_49949 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
jl_apply at /g/data/v45/nc3020/julia-1.8/src/julia.h:1843 [inlined]
true_main at /g/data/v45/nc3020/julia-1.8/src/jlapi.c:575
jl_repl_entrypoint at /g/data/v45/nc3020/julia-1.8/src/jlapi.c:719
main at /g/data/v45/nc3020/julia-1.8/cli/loader_exe.c:59
__libc_start_main at /lib64/libc.so.6 (unknown line)
_start at /g/data/v45/nc3020/julia-1.8/julia (unknown line)
Allocations: 11859354 (Pool: 11854492; Big: 4862); GC: 4
Segmentation fault
navidcy commented 1 year ago

But I am not sure what all these defaults do so is it safe just to enforce them just to avoid seeing the warnings?

glwagner commented 1 year ago

One key line is

NVBLAS_CPU_BLAS_LIB libopenblas.so

that might need to be correct, specific to your system

Also this line

NVBLAS_LOGFILE nvblas.log

it says the default is stderr (not nvblas.log). Maybe better to keep stderr?