chensong1995 / HybridPose

HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)
MIT License
412 stars 64 forks source link

ValueError: Caught ValueError in DataLoader worker process 0. #51

Closed jinzhiyang1 closed 3 years ago

jinzhiyang1 commented 3 years ago

你好; 我在运行train_core.py时报错; /home/jzy/anaconda3/envs/HybridPose-master/bin/python /home/jzy/桌面/jzy/HybridPose-master/src/train_core.py number of model parameters: 12959563 Traceback (most recent call last): File "/home/jzy/桌面/jzy/HybridPose-master/src/train_core.py", line 125, in trainer.train(epoch) File "/home/jzy/桌面/jzy/HybridPose-master/trainers/coretrainer.py", line 37, in train for i_batch, batch in enumerate(self.train_loader): File "/home/jzy/anaconda3/envs/HybridPose-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/jzy/anaconda3/envs/HybridPose-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/jzy/anaconda3/envs/HybridPose-master/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. 请问;是需要将parser.add_argument('--load_dir', type=str, default=None)中的‘None’换成自己的路径么,如果是的话,要换成的是什么呢?

chensong1995 commented 3 years ago

jinzhiyang1 你好!

感谢你对我们工作的关注。这里描述的报错信息和load_dir无关,请确认你是否已经按照 README 的内容配置好数据集。

希望这有帮到你。

jinzhiyang1 commented 3 years ago

感谢您的回复,如果可以的话方便说一下您搭建环境时ubuntu的内核版本以及驱动和cuda的版本么?这个问题困扰我很长时间了。万分感谢!

chensong1995 commented 3 years ago
(base) chen@chen-ThinkPad-T470s:~$ ssh song@titan-1.cs.utexas.edu
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-112-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
 * UTCS FAQ: http://www.cs.utexas.edu/facilities/faq/
 * Report software problems to help@cs.utexas.edu
 System information disabled due to load higher than 40.0

 * Introducing self-healing high availability clustering for MicroK8s!
   Super simple, hardened and opinionated Kubernetes for production.

     https://microk8s.io/high-availability

96 packages can be updated.
67 updates are security updates.

New release '20.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

*** System restart required ***

Please note: this system is intended to serve the instructional,
research, and administrative needs of the students, faculty, and
staff of the UT Austin Department of Computer Sciences.  Any other
use of this system, including but not limited to using any method
to circumvent proper authentication or authorization, constitutes
unauthorized access and may subject the user to criminal prosecution
under Texas Computer Crime Statutes and other state or federal laws.

Last login: Wed Nov 11 20:03:14 2020 from 204.137.227.114
song@titan-1:~$ /opt/cuda-10.0/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
jinzhiyang1 commented 3 years ago

感谢您的回复!!!

jinzhiyang1 commented 3 years ago

2020-11-14 19-35-42 的屏幕截图

jinzhiyang1 commented 3 years ago

2020-11-14 21-46-54 的屏幕截图

jinzhiyang1 commented 3 years ago

很抱歉再次打扰到您。之前的问题已经解决了。但是在我运行LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py命令之后,出现上述的错误,我浏览资料https://blog.csdn.net/rensandao/article/details/83539086告诉我未初始化变量,但是我找不到在哪里初始变量,希望您能给我一些提示,期待您的回复。

chensong1995 commented 3 years ago

jinzhiyang1 你好,

请问你是使用 git clone --recurse-submodules git@github.com:chensong1995/HybridPose.git 这个命令配置Eigen的吗?

jinzhiyang1 commented 3 years ago

2020-11-15 10-10-25 的屏幕截图 感谢您的提醒,我之前是单独安装的Eigen库,在经过您的提醒之后。我运行了那条命令,但是报了这个错误,希望再次得到您的建议(对打扰到您感到抱歉。)

chensong1995 commented 3 years ago

jinzhiyang1 你好,

这个问题应该和你的ssh配置有关。你可以试一下用HTTPS来克隆我们的代码 git clone --recurse-submodules https://github.com/chensong1995/HybridPose.git

jinzhiyang1 commented 3 years ago

您好,我有几个问题想确认一下 1.我想确认一下git clone --recurse-submodules git@github.com:chensong1995/HybridPose.git 这条命令是下载您的源码的时候使用的么? 2.我是在github上直接下载的您的源码。这两者有什么区别么? 3.当我执行这条命令时,会在当前路径下生成一个HybridPose的文件夹,内容是我在github下载的源码,但是下载的速度非常慢。我下载了好几次最多一次是在75%就停止然后报错的。 希望您能给我一些建议。

chensong1995 commented 3 years ago

jinzhiyang1 你好,

  1. 是的。
  2. 加上 --recursive-submodules 之后,git 会根据 .gitmodules 里的设置,自动下载 eigen 到 makefile 可以识别的路径。
  3. 这应该是你的网络连接速度较慢的原因,你可以尝试换一个网络服务商,或者换一个下载时段重试。

希望这有帮到你。

jinzhiyang1 commented 3 years ago

您好 我在运行git clone --recurse-submodules git@github.com:chensong1995/HybridPose.git 之后,得到了新的源码,并且重新运行了一遍,在train_core文件中执行train,test和save_model之后,执行到trainer.generate_data(val_loader)这一命令,然后终端便报错“已杀死”,请问这个要怎么解决呢? 2020-11-17 19-52-45 的屏幕截图

jinzhiyang1 commented 3 years ago

为了避免内存溢出,我调小了一些参数[ 2020-11-17 19-58-07 的屏幕截图 2020-11-17 19-58-00 的屏幕截图 2020-11-17 19-57-18 的屏幕截图 2020-11-17 19-57-07 的屏幕截图 以及设置了一些路径 之后便没有改动任何位置,请问是这些改动引起的错误么?希望您给一些建议。(对于这段时间对您的打扰感到万分抱歉。)

](url)

chensong1995 commented 3 years ago

jinzhiyang1 你好,

我的建议是在Python和C++的代码里都设上几个断点,定位到是哪一行出现了问题,这样分析起来会方便一些。

希望这有帮到你!

redredraccoon commented 3 years ago

@jinzhiyang1 你好,你之前提到的这个问题(如下)我也遇到一样的状况,想请问是如何解决的呢?

你好; 我在运行train_core.py时报错; /home/jzy/anaconda3/envs/HybridPose-master/bin/python /home/jzy/桌面/jzy/HybridPose-master/src/train_core.py number of model parameters: 12959563 Traceback (most recent call last): File "/home/jzy/桌面/jzy/HybridPose-master/src/train_core.py", line 125, in trainer.train(epoch) File "/home/jzy/桌面/jzy/HybridPose-master/trainers/coretrainer.py", line 37, in train for i_batch, batch in enumerate(self.train_loader): File "/home/jzy/anaconda3/envs/HybridPose-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/jzy/anaconda3/envs/HybridPose-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/jzy/anaconda3/envs/HybridPose-master/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. 请问;是需要将parser.add_argument('--load_dir', type=str, default=None)中的‘None’换成自己的路径么,如果是的话,要换成的是什么呢?

Zhangwenyao1 commented 1 year ago

请问你怎么解决的这个问题呀

jinzhiyang1 commented 1 year ago

忘了,但不是什么大问题

------------------ 原始邮件 ------------------ 发件人: "chensong1995/HybridPose" @.>; 发送时间: 2023年8月4日(星期五) 下午5:39 @.>; @.**@.>; 主题: Re: [chensong1995/HybridPose] ValueError: Caught ValueError in DataLoader worker process 0. (#51)

请问你怎么解决的这个问题呀

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>