baidu / Senta

Baidu's open-source Sentiment Analysis System.
Apache License 2.0
1.89k stars 370 forks source link

实验复现出现的环境配置问题,env.sh #48

Open Edward-Joker opened 3 years ago

Edward-Joker commented 3 years ago

你好,我在尝试浮现你们的实验时,遇到了环境配置的问题。我根据个人电脑情况修改了env.sh文件,具体内容如下: set -x

在LD_LIBRARY_PATH中添加cuda库的路径

export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/cuda-10.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH

在LD_LIBRARY_PATH中添加cudnn库的路径

export LD_LIBRARY_PATH=/home/work/cudnn/cudnn_v7.4/cuda/lib64:$LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

需要先下载NCCL,然后在LD_LIBRARY_PATH中添加NCCL库的路径

export LD_LIBRARY_PATH=/home/work/nccl/nccl2.4.2_cuda10.1/lib:$LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBARY_PATH

如果FLAGS_sync_nccl_allreduce为1,则会在allreduce_op_handle中调用cudaStreamSynchronize(nccl_stream),这种模式在某些情况下可以获得更好的性能

export FLAGS_sync_nccl_allreduce=1

表示分配的显存块占GPU总可用显存大小的比例,范围[0,1]

export FLAGS_fraction_of_gpu_memory_to_use=1

选择要使用的GPU

export CUDA_VISIBLE_DEVICES=0

表示是否使用垃圾回收策略来优化网络的内存使用,<0表示禁用,>=0表示启用

export FLAGS_eager_delete_tensor_gb=1.0

是否使用快速垃圾回收策略

export FLAGS_fast_eager_deletion_mode=1

垃圾回收策略释放变量的内存大小百分比,范围为[0.0, 1.0]

export FLAGS_memory_fraction_of_eager_deletion=1

设置fluid路径

export PATH=fluid=/home/work/python/bin:$PATH

export PATH=fluid=/home/edward/anaconda3/envs/Paddle-Pytorch/bin:$PATH

export PATH=fluid=/home/edward/anaconda3/envs/Paddle-Pytorch/lib/python3.7/site-packages/paddle/include/paddle:$PATH

设置python

alias python=/home/work/python/bin/python

alias python=/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python set +x ~

但是在运行具体任务时,如: sh ./script/run_infer.sh ./config/roberta_skep_large_en.absa_laptops.infer.json 抛出了这样的错误: Traceback (most recent call last): File "./lanch.py", line 137, in main(lanch_args) File "./lanch.py", line 130, in main start_procs(args) File "./lanch.py", line 121, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.

接着我去检查lanch.py文件,我推断出有可能是我的环境参数的问题,但是官方给出的环境参数配置写的不明确,导致我不知道如何修改环境配置参数。望能够给出详细的环境配置过程 #8

jcfeng commented 3 years ago

根据我的经验,可以直接把env.sh这一行注释掉。只要安装了依赖库即可

brickee commented 3 years ago

楼主现在问题解决了吗?

ppd118 commented 3 years ago

我也遇到了这个问题,请问现在有解决办法吗?

wyxscir commented 2 years ago

我也有这个问题,大家解决了吗

allenhung1025 commented 2 years ago

我也有这个问题,大家解决了吗

zhuliwen commented 2 years ago

我也有这个问题,大家解决了吗