Closed monajalal closed 1 year ago
Hi Mona,
Can you check if this link helps?
I think I am confused because I am using the same exact version of torch
and torchvision
as you are using in requirements.txt
. Doesn't it mean that the code should work with no further modification for loading the pth model?
Thanks for any explanation:
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.2.0'
>>> import torchvision
>>> torchvision.__version__
'0.4.0a0'
also, changed torch.load
to torch.jit.load
and still same problem:
def load_session(model, optim, args):
try:
start_epoch = int(args.load_dir.split('/')[-1]) + 1
# model.load_state_dict(torch.load(os.path.join(args.load_dir, 'model.pth')))
# optim.load_state_dict(torch.load(os.path.join(args.load_dir, 'optim.pth')))
model.load_state_dict(torch.jit.load(os.path.join(args.load_dir, 'model.pth')))
optim.load_state_dict(torch.jit.load(os.path.join(args.load_dir, 'optim.pth')))
for param_group in optim.param_groups:
param_group['lr'] = args.lr
print('Successfully loaded model from {}'.format(args.load_dir))
except Exception as e:
pdb.set_trace()
print('Could not restore session properly, check the load_dir')
return model, optim, start_epoch
I have:
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --load_dir /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape
number of model parameters: 12959563
loading checkpoint from /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
> /home/mona/HP/HybridPose/lib/utils.py(34)load_session()
-> print('Could not restore session properly, check the load_dir')
(Pdb) quit()
Traceback (most recent call last):
File "/home/mona/HP/HybridPose/lib/utils.py", line 27, in load_session
model.load_state_dict(torch.jit.load(os.path.join(args.load_dir, 'model.pth')))
File "/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/jit/__init__.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
RuntimeError: version_number <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /tmp/pip-req-build-58y_cjjl/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /tmp/pip-req-build-58y_cjjl/caffe2/serialize/inline_container.cc:131)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6d (0x7fb30d0091cd in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x246d (0x7fb2defafe9d in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x69 (0x7fb2defb1359 in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: torch::jit::import_ir_module(std::shared_ptr<torch::jit::script::CompilationUnit>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x4d (0x7fb2e0119ddd in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x51031b (0x7fb2ff51031b in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1c7126 (0x7fb2ff1c7126 in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #24: <unknown function> + 0x29d90 (0x7fb30f629d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #25: __libc_start_main + 0x80 (0x7fb30f629e40 in /lib/x86_64-linux-gnu/libc.so.6)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/train_core.py", line 102, in <module>
model, optimizer, start_epoch = setup_model(args)
File "src/train_core.py", line 88, in setup_model
model, optimizer, start_epoch = load_session(model, optimizer, args)
File "/home/mona/HP/HybridPose/lib/utils.py", line 34, in load_session
print('Could not restore session properly, check the load_dir')
File "/home/mona/HP/HybridPose/lib/utils.py", line 34, in load_session
print('Could not restore session properly, check the load_dir')
File "/home/mona/anaconda3/envs/hp/lib/python3.7/bdb.py", line 88, in trace_dispatch
return self.dispatch_line(frame)
File "/home/mona/anaconda3/envs/hp/lib/python3.7/bdb.py", line 113, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ ls /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
total 149M
drwx------ 12 mona mona 4.0K Oct 23 15:23 ..
-rw------- 1 mona mona 50M Oct 23 15:23 model.pth
drwx------ 2 mona mona 4.0K Oct 23 15:23 .
-rw------- 1 mona mona 99M Oct 23 15:23 optim.pth
Hi Mona,
I just tried myself with PyTorch 1.2.0 and was not able to reproduce the issue. Can you calculate the md5 checksum for the two .pth files and verify if your download is good? I have:
song@xiaochong ~/D/H/s/a/c/0/199 (master)> md5sum model.pth (hybridpose)
96ca7f9bae5628fe551434949d5950e1 model.pth
song@xiaochong ~/D/H/s/a/c/0/199 (master)> md5sum optim.pth (hybridpose)
fd9e976740e63a8d2d82ce9570987712 optim.pth
Thanks a lot for checking.
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ cd /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199$ md5sum model.pth
b545a494628c9db318bd248cde2823e7 model.pth
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199$ md5sum optim.pth
e2ad8564e72cae263a7434a5cb73aa21 optim.pth
I have different number than yours. I downloaded the weights from OneDrive. Please let me know if there is more information needed for looking into this issue.
Hi Mona,
Looks like the files you downloaded are corrupted. I just downloaded a fresh copy from the OneDrive link in README and verified the checksums are the same:
song@xiaochong ~/D/mona> ls (base)
ape/ ape.zip
song@xiaochong ~/D/mona> cd ape/checkpoints/0.001/199/ (base)
song@xiaochong ~/D/m/a/c/0/199> ls (base)
model.pth optim.pth
song@xiaochong ~/D/m/a/c/0/199> cd - (base)
song@xiaochong ~/D/mona> ls (base)
ape/ ape.zip
song@xiaochong ~/D/mona> md5sum ape.zip (base)
295e096aee296c3497a53dfff75c22c7 ape.zip
song@xiaochong ~/D/mona> cd ape/checkpoints/0.001/199/ (base)
song@xiaochong ~/D/m/a/c/0/199> md5sum model.pth (base)
96ca7f9bae5628fe551434949d5950e1 model.pth
song@xiaochong ~/D/m/a/c/0/199> md5sum optim.pth (base)
fd9e976740e63a8d2d82ce9570987712 optim.pth
It's very strange since I waited for each individual download to finish. I redownloaded it now and have the same exact weights, we may have had network issues. How
model.load_state_dict(torch.load(os.path.join(args.load_dir, 'model.pth')))
optim.load_state_dict(torch.load(os.path.join(args.load_dir, 'optim.pth')))
^^ changing back to original form.
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ cd /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199$ md5sum model.pth
96ca7f9bae5628fe551434949d5950e1 model.pth
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199$ md5sum
model.pth optim.pth
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199$ md5sum optim.pth
fd9e976740e63a8d2d82ce9570987712 optim.pth
I get this error:
I have