error in prediction - Githubissues

Wulin-Tan commented 5 months ago

Hi, lightning pose team based on your tutorial, I can run the training with 'train_hydra.py' without error reporting:

python train_hydra.py --config-path=/root/autodl-tmp/DLC_LP --config-name=config_LP.yaml

and get a new directory: outputs/2024-04-07/11-48-45/

Now I want to run 'predict_new_vids.py'

python predict_new_vids.py --config-path=/root/autodl-tmp/DLC_LP --config-name=config_LP.yaml

but it gives the error as:

[2024-04-07 23:33:49,971][HYDRA] /root/miniconda3/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Error executing job with overrides: []
Traceback (most recent call last):
  File "predict_new_vids.py", line 116, in predict_videos_in_dir
    absolute_cfg_path = return_absolute_path(hydra_relative_path, n_dirs_back=2)
  File "/root/miniconda3/lib/python3.8/site-packages/lightning_pose/utils/io.py", line 153, in return_absolute_path
    raise IOError("%s is not a valid path" % abs_path)
OSError: /root/autodl-tmp/DLC_LP/outputs/outputs/2024-04-07/11-48-45/ is not a valid path

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
root@autodl-container-6be511a5ae-e94941e1:~/autodl-tmp/DLC_LP# python predict_new_vids.py --config-path=/root/autodl-tmp/DLC_LP --config-name=config_LP.yaml
[2024-04-07 23:34:41,733][HYDRA] /root/miniconda3/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Error executing job with overrides: []
Traceback (most recent call last):
  File "predict_new_vids.py", line 116, in predict_videos_in_dir
    absolute_cfg_path = return_absolute_path(hydra_relative_path, n_dirs_back=2)
  File "/root/miniconda3/lib/python3.8/site-packages/lightning_pose/utils/io.py", line 153, in return_absolute_path
    raise IOError("%s is not a valid path" % abs_path)
OSError: /root/autodl-tmp/DLC_LP/outputs/outputs/2024-04-07/11-48-45/ is not a valid path

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

And here is my config file:

data:
  image_orig_dims:
    height: 2160
    width: 2160
  image_resize_dims:
    height: 512
    width: 512
  data_dir: /root/autodl-tmp/DLC_LP
  video_dir: /root/autodl-tmp/DLC_LP/videos
  csv_file: CollectedData.csv
  downsample_factor: 2
  num_keypoints: 6
  keypoint_names:
  - snout
  - forepaw_L
  - forefaw_R
  - hindpaw_L
  - hindpaw_R
  - base
  mirrored_column_matches: null
  columns_for_singleview_pca: null
training:
  imgaug: dlc
  train_batch_size: 8
  val_batch_size: 32
  test_batch_size: 32
  train_prob: 0.95
  val_prob: 0.05
  train_frames: 1
  num_gpus: 1
  num_workers: 4
  early_stop_patience: 3
  unfreezing_epoch: 20
  min_epochs: 5
  max_epochs: 10
  log_every_n_steps: 10
  check_val_every_n_epoch: 5
  gpu_id: 0
  rng_seed_data_pt: 0
  rng_seed_model_pt: 0
  lr_scheduler: multisteplr
  lr_scheduler_params:
    multisteplr:
      milestones:
      - 150
      - 200
      - 250
      gamma: 0.5
model:
  losses_to_use:
  - pca_singleview
  - temporal
  backbone: resnet50_animal_ap10k
  model_type: heatmap_mhcrnn
  heatmap_loss_type: mse
  model_name: DLC_LP
dali:
  general:
    seed: 123456
  base:
    train:
      sequence_length: 32
    predict:
      sequence_length: 96
  context:
    train:
      batch_size: 16
    predict:
      sequence_length: 96
losses:
  pca_multiview:
    log_weight: 5.0
    components_to_keep: 3
    epsilon: null
  pca_singleview:
    log_weight: 5.0
    components_to_keep: 0.99
    epsilon: null
  temporal:
    log_weight: 5.0
    epsilon: 20.0
    prob_threshold: 0.05
eval:
  hydra_paths: ["outputs/2024-04-07/11-48-45/"] 
  predict_vids_after_training: true
  save_vids_after_training: false
  fiftyone:
    dataset_name: test
    model_display_names:
    - test_model
    launch_app_from_script: false
    remote: true
    address: 127.0.0.1
    port: 5151
  test_videos_directory: /root/autodl-tmp/DLC_LP/videos
  saved_vid_preds_dir: null
  confidence_thresh_for_vid: 0.9
  video_file_to_plot: null
  pred_csv_files_to_plot:
  - ' '
callbacks:
  anneal_weight:
    attr_name: total_unsupervised_importance
    init_val: 0.0
    increase_factor: 0.01
    final_val: 1.0
    freeze_until_epoch: 0
hydra:
  run:
    dir: outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
  sweep:
    dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S}
    subdir: ${hydra.job.num}

so any suggestion? Thank you.

themattinthehatt commented 5 months ago

Hi @Wulin-Tan, thanks for sharing your config file, the issue here is that the entry eval.hydra_paths should not include "outputs"; rather it should just be ["2024-04-07/11-48-45"]; alternatively you can put in the absolute path (starting from /root)

Wulin-Tan commented 5 months ago

Hi @Wulin-Tan, thanks for sharing your config file, the issue here is that the entry eval.hydra_paths should not include "outputs"; rather it should just be ["2024-04-07/11-48-45"]; alternatively you can put in the absolute path (starting from /root)

Hi @themattinthehatt it works perfect now according to your suggestion. Thank you.

danbider / lightning-pose

error in prediction #139