Closed tinybitdesu closed 5 months ago
Hi and thank you for your interest in our work!
We do not provide checkpoints because our model is adapted at test time on each video sample and therefore we do not have a final trained model. This is true for both the training-free baseline (tf_<dataset_name>.yaml
) and for the main method (tf_<dataset_name>.yaml
). When you run it it will load pre-trained coca_ViT-L-14
weights from open_clip.
Are you following all the steps in the readme? Could you please include the command that causes this error?
Hello, Following your kindly instructions I overcomed the previous bug, however, I found a new problem, specifically, there seems no predicted label in my dataframe, you can see that my dataframe is empty, it has no rows and columns, which causes numpy to throw an error. I want to know how to proceed to the next step of debug. Thank you again for any help! I think I have listed the specific parameters below, which may be helpful.
[2024-05-15 14:48:45,715][src.utils.utils][INFO] - Enforcing tags!
│ batch_size: 1
│ num_workers: 1
│ pin_memory: false
│ nsplit: 0
│ config: ./config/thumos.yaml
│
├── model
│ └── target: src.models.tf_method_module.T3AL0Module
│ split: 0
│ dataset: thumos
│ setting: 50
│ video_path: .../T3AL/thumos_features
│ net:
│ target: src.models.components.tf_method.T3AL0Net
│ dataset: thumos
│ split: 0
│ setting: 50
│ kernel_size: 20
│ stride: 20
│ visualize: false
│ normalize: true
│ remove_background: true
│ video_path: .../T3AL/thumos_features
│
├── callbacks
│ └── model_checkpoint:
│ target: lightning.pytorch.callbacks.ModelCheckpoint
│ dirpath: .../T3AL/logs/train/runs/2024-05-1514-48-45/checkpoints
│ filename: epoch{epoch:03d}
│ monitor: null
│ verbose: false
│ save_last: true
│ save_top_k: 1
│ mode: max
│ auto_insert_metric_name: false
│ save_weights_only: false
│ every_n_train_steps: null
│ train_time_interval: null
│ every_n_epochs: null
│ save_on_train_epoch_end: null
│ model_summary:
│ target: lightning.pytorch.callbacks.RichModelSummary
│ max_depth: -1
│ rich_progress_bar:
│ target: lightning.pytorch.callbacks.RichProgressBar
│
├── logger
│ └── wandb:
│ target: lightning.pytorch.loggers.wandb.WandbLogger
│ save_dir: .../T3AL/logs/train/runs/2024-05-15_14-48-45
│ offline: false
│ id: null
│ anonymous: null
│ project: TAD
│ log_model: false
│ prefix: ''
│ group: ''
│ tags: []
│ job_type: ''
│ name: thumos
│
├── trainer
│ └── target: lightning.pytorch.trainer.Trainer
│ default_root_dir: .../T3AL/logs/train/runs/2024-05-15_14-48-45
│ min_epochs: 1
│ max_epochs: 0
│ accelerator: cpu
│ devices: 1
│ check_val_every_n_epoch: 1
│ deterministic: false
│
├── paths
│ └── root_dir: .../T3AL
│ data_dir: .../T3AL/data/
│ log_dir: .../T3AL/logs/
│ output_dir: .../T3AL/logs/train/runs/2024-05-15_14-48-45
│ work_dir: .../T3AL
│
├── extras
│ └── ignore_warnings: false
│ enforce_tags: true
│ print_config: true
│
├── task_name
│ └── train
├── tags
│ └── ['dev']
├── train
│ └── False
├── test
│ └── True
├── compile
│ └── False
├── ckpt_path
│ └── “”
├── seed
│ └── 12345
└── exp_name
└── thumos
Seed set to 12345
[2024-05-15 14:48:45,836][main][INFO] - Instantiating datamodule srun
command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with srun
like so: srun python src/train.py experiment=tf_thumos data=thumos model. ...
Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default ModelSummary
callback.
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
.../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/setup.py:187: GPU available but not used. You can set it by doing Trainer(accelerator='gpu')
.
[2024-05-15 14:48:56,140][main][INFO] - Logging hyperparameters!
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: WARNING resume
will be ignored since W&B syncing is set to offline
. Starting a new run with run id dyxora2w.
wandb: Tracking run with wandb version 0.17.0
wandb: W&B syncing is set to offline
in this directory.
wandb: Run wandb online
or set WANDB_MODE=online to enable cloud syncing.
[2024-05-15 14:48:59,596][main][INFO] - Starting testing!
[2024-05-15 14:48:59,597][main][WARNING] - Best ckpt not found! Using current weights for testing...
No of videos in train is 214
Loading train Video Information ...
No of class 10
No of videos in validation is 203
Loading validation Video Information ...
No of class 10
.../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers
argumentto
num_workers=19in the
DataLoader` to improve performance.
Start testing...
video-id t-start t-end label
0 video_validation_0000365 18.1 24.3 HighJump
1 video_validation_0000365 29.6 33.3 HighJump
2 video_validation_0000365 69.7 77.3 HighJump
3 video_validation_0000365 80.8 84.3 HighJump
4 video_validation_0000365 110.4 116.2 HighJump
Empty DataFrame
Columns: []
Index: []
Ground truth labels: ['HighJump' 'PoleVault' 'TennisSwing' 'GolfSwing' 'HammerThrow'
'Billiards' 'BaseballPitch' 'CleanAndJerk' 'ThrowDiscus' 'SoccerPenalty']
Testing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 203/203 0:00:10 • 0:00:00 19.94it/s
[2024-05-15 14:49:10,282][src.utils.utils][ERROR] -
Traceback (most recent call last):
File ".../T3AL/src/utils/utils.py", line 65, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "src/train.py", line 103, in train
trainer.test(model=model, datamodule=datamodule, ckpt_path=ckpt_path)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 754, in test
return call._call_and_handle_interrupt(
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, kwargs)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 794, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1026, in _run_stage
return self._evaluation_loop.run()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, *kwargs)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 142, in run
return self.on_run_end()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 254, in on_run_end
self._on_evaluation_epoch_end()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 334, in _on_evaluation_epoch_end
call._call_lightning_module_hook(trainer, hook_name)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
output = fn(args, kwargs)
File ".../T3AL/src/models/tf_method_module.py", line 117, in on_test_epoch_end
aps, tious = evaluate(self.dataset, self.predictions, self.split, self.setting, self.video_path)
File ".../T3AL/src/evaluate.py", line 226, in evaluate
print("Predicted labels: ", predicted["label"].unique())
File ".../miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3761, in getitem
indexer = self.columns.get_loc(key)
File ".../miniconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py", line 349, in get_loc
raise KeyError(key)
KeyError: 'label'
[2024-05-15 14:49:10,290][src.utils.utils][INFO] - Output dir: .../T3AL/logs/train/runs/2024-05-15_14-48-45
[2024-05-15 14:49:10,290][src.utils.utils][INFO] - Closing wandb!
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync .../T3AL/logs/train/runs/2024-05-15_14-48-45/wandb/offline-run-20240515_144858-dyxora2w
wandb: Find logs at: ./logs/train/runs/2024-05-15_14-48-45/wandb/offline-run-20240515_144858-dyxora2w/logs
Error executing job with overrides: ['experiment=tf_thumos', 'data=thumos', 'model.video_path=.../T3AL/thumos_features']
Traceback (most recent call last):
File "src/train.py", line 121, in main
metricdict, = train(cfg)
File ".../T3AL/src/utils/utils.py", line 75, in wrap
raise ex
File ".../T3AL/src/utils/utils.py", line 65, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "src/train.py", line 103, in train
trainer.test(model=model, datamodule=datamodule, ckpt_path=ckpt_path)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 754, in test
return call._call_and_handle_interrupt(
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, kwargs)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 794, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1026, in _run_stage
return self._evaluation_loop.run()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, *kwargs)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 142, in run
return self.on_run_end()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 254, in on_run_end
self._on_evaluation_epoch_end()
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 334, in _on_evaluation_epoch_end
call._call_lightning_module_hook(trainer, hook_name)
File ".../miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
output = fn(args, kwargs)
File ".../T3AL/src/models/tf_method_module.py", line 117, in on_test_epoch_end
aps, tious = evaluate(self.dataset, self.predictions, self.split, self.setting, self.video_path)
File ".../T3AL/src/evaluate.py", line 226, in evaluate
print("Predicted labels: ", predicted["label"].unique())
File ".../miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3761, in getitem
indexer = self.columns.get_loc(key)
File ".../miniconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py", line 349, in get_loc
raise KeyError(key)
KeyError: 'label'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
hello, I am currently attempting to replicate your efforts. However, I met a problem during my attempt to bash train.py following your readme. I got a callback: I used Trained free model tf_thumos.yaml, basically I think I do not need to train it, but where can I find checkpoints to start training process? Thanks for your help and your diligent work!