Dingpx / EAI

Official code of [AAAI2024] Expressive Forecasting of 3D Whole-body Human Motions
24 stars 2 forks source link

FileNotFoundError: [Errno 2] No such file or directory: './checkpoint/test/ckpt_eai_dct_n30_out30_dctn60_best.pth.tar' #2

Open johndpope opened 1 week ago

johndpope commented 1 week ago

running test.py errors

johndpope commented 1 week ago

does it need it training? there's an error in training code Falsecd https://github.com/Dingpx/EAI/blob/main/train.py#L168

UPDATE - how to define these? rank = int(os.environ["RANK"])


  File "<frozen os>", line 679, in __getitem__
KeyError: 'RANK'

UPDATE - found these
```

export RANK=0
export PORT=12345
export LOCAL_RANK=0
export MASTER_ADDR=127.0.0.1    
```

UPDATE

i replace the the distributed training with **accelerate**  
https://github.com/johndpope/EAI/blob/main/train2.py

it's training...
![Screenshot from 2024-09-10 05-37-28](https://github.com/user-attachments/assets/2121df0c-bf74-4bdf-902b-5a3898ac379c)

how long with how much gpu cluster to train ? 

```shell
usage: train.py [-h] [--device DEVICE] [--grab_data_dict GRAB_DATA_DICT] [--exp EXP] [--ckpt CKPT]
                [--model_type MODEL_TYPE] [--max_norm] [--linear_size LINEAR_SIZE] [--num_stage NUM_STAGE]
                [--num_body NUM_BODY] [--num_lh NUM_LH] [--num_rh NUM_RH] [--lr LR] [--lr_decay LR_DECAY]
                [--lr_gamma LR_GAMMA] [--input_n INPUT_N] [--output_n OUTPUT_N] [--all_n ALL_N] [--actions ACTIONS]
                [--epochs EPOCHS] [--dropout DROPOUT] [--train_batch TRAIN_BATCH] [--val_batch VAL_BATCH]
                [--test_batch TEST_BATCH] [--job JOB] [--seed SEED] [--local_rank LOCAL_RANK] [--W_pg W_PG]
                [--W_p W_P] [--is_load] [--is_debug] [--is_exp] [--sample_rate SAMPLE_RATE] [--is_norm_dct]
                [--is_norm] [--is_using_saved_file] [--is_hand_norm] [--is_hand_norm_split] [--is_part]
                [--part_type PART_TYPE] [--is_boneloss] [--is_weighted_jointloss] [--is_using_noTpose2]
                [--is_using_raw] [--J J]
train.py: error: unrecognized arguments: --local-rank=0

```
Dingpx commented 3 days ago

Sorry, it's been a long time since I last ran this code. It seems that I used 8 A100/V100 to train this project. Regarding the checkpoint, please wait for me a moment as I am busy with my current project, and I will release this checkpoint in a few weeks.

johndpope commented 3 days ago

my accelerate code got me by - thanks

Im interested to take 2 body poses - https://github.com/johndpope/EAI/blob/main/pose_vis.py

and using the coco-wholebody - https://github.com/johndpope/EAI/blob/main/test.png interpolate between them (using correct human like joint movement)

my reading is that this codebase could be suitable - did you do any work here? I found more repos doing gesture / fusion - but i just want a sequence of poses - to throw into stable diffusion....