buxiangzhiren / DDCap

MIT License
84 stars 11 forks source link

关于第一轮结束之后,关于java的报错问题(用五个卡跑的,java问题报错后, watch -n 2 --color gpustat --c显示还有四个卡在跑) #34

Open Markkk111 opened 1 year ago

Markkk111 commented 1 year ago

您好,非常感谢您能解答这个问题!

1、报错代码如下:

Evaling epoch 0 caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:17<00:00, 8.60s/it] caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:17<00:00, 8.59s/it] caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:18<00:00, 8.68s/it] caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:19<00:00, 8.71s/it] caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:20<00:00, 8.75s/it] loading annotations into memory... 0:00:00.266195 creating index... index created! Loading and preparing results...
DONE (t=0.02s) creating index... index created! tokenization... PTBTokenizer tokenized 307085 tokens at 1518683.41 tokens per second. PTBTokenizer tokenized 58441 tokens at 526455.68 tokens per second. setting up scorers... computing Bleu score... {'testlen': 48641, 'reflen': 47900, 'guess': [48641, 43641, 38641, 33641], 'correct': [22410, 5483, 267, 19]} ratio: 1.0154697286012315 Bleu_1: 0.461 Bleu_2: 0.241 Bleu_3: 0.074 Bleu_4: 0.022 computing METEOR score... METEOR: 0.104 computing Rouge score... ROUGE_L: 0.355 computing CIDEr score... CIDEr: 0.104 computing SPICE score... Invalid maximum heap size: -Xmx8G The specified size exceeds the maximum representable size. Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. subprocess.CalledProcessError: Command '['/home/anaconda3/envs/DDCap/bin/python', '-u', 'train.py', '--local_rank=4', '--out_dir', '/home/tjut_caixiasong/ddcap/results_diff', '--tag', 'caption_diff_vitb16']' died with <Signals.SIGSEGV: 11>.

2、java版本如下: ~/ddcap$ java -version java version "1.8.0_361" Java(TM) SE Runtime Environment (build 1.8.0_361-b09) Java HotSpot(TM) Server VM (build 25.361-b09, mixed mode)

3、由于没有root权限,修改bashrc文件配置如下:

java profile

export JAVA_HOME=/home/username/java/jdk1.8.0_361 export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$SRILM/bin/i686-m64:$SRILM/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME

export LD_LIBRARY_PATH=/home/username/anaconda3/envs/diffusion/lib

export PATH="$PATH:/tmp/bin"

export LD_LIBRARY_PATH=/home/app/anaconda3/lib

export PYTORCH_NVFUSER_DISABLE=fallback export LESS="-R" export JAVA_HOME=/home/username/java/jdk1.8.0_361 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH

4、修改后有source使之生效。

并且多次pkill重新运行仍然报错,感谢!

buxiangzhiren commented 1 year ago

应该是java版本不对,导致的内存空间不够报错了

Markkk111 commented 1 year ago

谢谢您的回答,可以问一下您用的那个java版本吗?

buxiangzhiren commented 1 year ago

试下这个命令"$ sudo apt-get update",然后 "$ sudo apt-get install openjdk-8-jdk".

Markkk111 commented 1 year ago

您好 ,我没有sudo权限 ,我之前装的java是在非root权限下装的 ,在管理员账户下执行以上命令,我的java版本还是openjdk version "1.8.0_361"。 那我接下来再尝试一下,在非root账户下安装最新版本的java试试,谢谢您的耐心与回复。

buxiangzhiren commented 1 year ago

还有可能是装的32位的问题,你确定一下装的是64位的

Markkk111 commented 1 year ago

谢谢!第一轮正常结束了现在!非常感谢您的回0复!(*java是最新版本且64bit) ~$ java -version openjdk version "1.8.0_362" OpenJDK Runtime Environment (build 1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09) OpenJDK 64-Bit Server VM (build 25.362-b09, mixed mode)

buxiangzhiren commented 1 year ago

不客气,问题解决了就好

Markkk111 commented 1 year ago

您好,我的结果和论文中差不多,请问在论文中的Table 5部分,Continuous Diffusion的实现具体是借鉴了哪个model啊?

buxiangzhiren commented 1 year ago

我们当时用的ddpm的code,然后直接用一个训练好的fixed token embedding layer投影到latent space。在latent space上面做的。从纯noise出发,得到一个vector,与前面fixed token embedding layer的weight计算相似度,相似度最小的就是最后的token。

Markkk111 commented 1 year ago

嗯嗯,谢谢您的回复,万分感激,祝您学业顺利,生活美满!