QitaoZhao / ContextAware-PoseFormer

The project is an official implementation of our paper "A Single 2D Pose With Context is Worth Hundreds for 3D Human Pose Estimation".
65 stars 4 forks source link

Question about train.py #3

Closed Bzw666 closed 9 months ago

Bzw666 commented 9 months ago

I followed the readme file to process the data, completed the preparatory work, and entered the training and testing instructions. However, I was prompted that human36m was not defined and could not be trained or tested

name 'human36m' is not defined File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 57, in setup_human36m_dataloaders train_dataset = eval(config.dataset.train_dataset)( File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 126, in setup_dataloaders train_dataloader, val_dataloader, train_sampler, dist_size = setup_human36m_dataloaders(config, is_train, distributed_train, rank, world_size) File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 460, in main train_dataloader, val_dataloader, train_sampler, whole_val_dataloader, dist_size = setup_dataloaders(config, distributed_train=is_distributed, rank=rank, world_size=world_size) File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 560, in main(args) NameError: name 'human36m' is not defined

QitaoZhao commented 9 months ago

Hi, thanks for your comments! It seems that something was mis-deleted in the previous commit. I have added it back in https://github.com/QitaoZhao/ContextAware-PoseFormer/blob/e5a4180f596008db69afc5cad123b7bfd6cbb2e6/ContextPose/train.py#L18.

QitaoZhao commented 9 months ago

And make sure the dataset loading is correct if you made any modifications! https://github.com/QitaoZhao/ContextAware-PoseFormer/blob/e5a4180f596008db69afc5cad123b7bfd6cbb2e6/ContextPose/train.py#L57

Bzw666 commented 9 months ago

Thank you for your reply. The train file can now be run, but a new problem has emerged - KeyError: Caught KeyError in DataLoader worker process 0. I believe this is a data processing issue. When running the code, the h36m_train/validation.pkl file was generated by the original H36M-Toolbox's generate_labels.py script. Will this have any impact? What are the differences between the files generated by generate_labels_h36m.py and the original processing file? If it's convenient, could you please send me the h36m_train/validation.pkl file that your code uses?

Loading backbone from /mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/data/pretrained/coco/pose_hrnet_w32_256x192.pth Loading data... Trainable parameter count: 14094147 Traceback (most recent call last): File "train.py", line 561, in main(args) File "train.py", line 488, in main epoch_loss_3d_train = one_epoch_full(model, criterion, optimizer, config, train_dataloader, device, epoch, n_iters_total=n_iters_total_train, is_train=True, lr=lr_dict, master=master, experiment_dir=experiment_dir, writer=writer) File "train.py", line 189, in one_epoch_full prefetcher = dataset_utils.data_prefetcher(dataloader, device, is_train, config.val.flip_test) File "/ContextAware-PoseFormer/ContextPose/mvn/datasets/utils.py", line 26, in init self.preload() File "/ContextAware-PoseFormer/ContextPose/mvn/datasets/utils.py", line 30, in preload self.next_batch = next(self.loader) File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/ContextAware-PoseFormer/ContextPose/mvn/datasets/human36m.py", line 302, in getitem return image, np.expand_dims(shot['joints_3d'], axis=0), shot['joints_2d_cpn'], shot['joints_2d_cpn_crop'] KeyError: 'joints_2d_cpn'

Traceback (most recent call last): File "/anaconda3/envs/diffpose/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/mnt/newdisk3/bzw/anaconda3/envs/diffpose/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/anaconda3/envs/diffpose/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/anaconda3/envs/diffpose/bin/python', '-u', 'train.py', '--local_rank=0']' returned non-zero exit status 1.

QitaoZhao commented 9 months ago

Please refer to this issue https://github.com/QitaoZhao/ContextAware-PoseFormer/issues/1#issuecomment-1806862985 where I uploaded my own pre-processed data. You should use the provided script to process the raw data as we pre-define some attributes that are subsequently used during training and testing.