alias python=/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python
set +x
~
但是在运行具体任务时,如: sh ./script/run_infer.sh ./config/roberta_skep_large_en.absa_laptops.infer.json
抛出了这样的错误:
Traceback (most recent call last):
File "./lanch.py", line 137, in
main(lanch_args)
File "./lanch.py", line 130, in main
start_procs(args)
File "./lanch.py", line 121, in start_procs
cmd=cmds[i])
subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.
你好,我在尝试浮现你们的实验时,遇到了环境配置的问题。我根据个人电脑情况修改了env.sh文件,具体内容如下: set -x
在LD_LIBRARY_PATH中添加cuda库的路径
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/cuda-10.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
在LD_LIBRARY_PATH中添加cudnn库的路径
export LD_LIBRARY_PATH=/home/work/cudnn/cudnn_v7.4/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
需要先下载NCCL,然后在LD_LIBRARY_PATH中添加NCCL库的路径
export LD_LIBRARY_PATH=/home/work/nccl/nccl2.4.2_cuda10.1/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBARY_PATH
如果FLAGS_sync_nccl_allreduce为1,则会在allreduce_op_handle中调用cudaStreamSynchronize(nccl_stream),这种模式在某些情况下可以获得更好的性能
export FLAGS_sync_nccl_allreduce=1
表示分配的显存块占GPU总可用显存大小的比例,范围[0,1]
export FLAGS_fraction_of_gpu_memory_to_use=1
选择要使用的GPU
export CUDA_VISIBLE_DEVICES=0
表示是否使用垃圾回收策略来优化网络的内存使用,<0表示禁用,>=0表示启用
export FLAGS_eager_delete_tensor_gb=1.0
是否使用快速垃圾回收策略
export FLAGS_fast_eager_deletion_mode=1
垃圾回收策略释放变量的内存大小百分比,范围为[0.0, 1.0]
export FLAGS_memory_fraction_of_eager_deletion=1
设置fluid路径
export PATH=fluid=/home/work/python/bin:$PATH
export PATH=fluid=/home/edward/anaconda3/envs/Paddle-Pytorch/bin:$PATH
export PATH=fluid=/home/edward/anaconda3/envs/Paddle-Pytorch/lib/python3.7/site-packages/paddle/include/paddle:$PATH
设置python
alias python=/home/work/python/bin/python
alias python=/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python set +x ~
但是在运行具体任务时,如: sh ./script/run_infer.sh ./config/roberta_skep_large_en.absa_laptops.infer.json 抛出了这样的错误: Traceback (most recent call last): File "./lanch.py", line 137, in
main(lanch_args)
File "./lanch.py", line 130, in main
start_procs(args)
File "./lanch.py", line 121, in start_procs
cmd=cmds[i])
subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.
接着我去检查lanch.py文件,我推断出有可能是我的环境参数的问题,但是官方给出的环境参数配置写的不明确,导致我不知道如何修改环境配置参数。望能够给出详细的环境配置过程 #8