I am delighted to see this inspiring project, but I have a few questions.

flashmoment commented 1 month ago

1.Why is there only one weight file in the data cloud drive? Is this weight file specific to a particular object? 2.What is the file "Linemod updated" referring to?

flashmoment commented 1 month ago

ok .im glad that i start ,but console show:Creating occluders from VOC: ../data/VOCdevkit/VOC2012 train: Scanning '..\data\LINEMOD\benchvise\train' for images, labels and masks... 183 found, 0 missing, 0 empty, 0 corrupted, 0 found, 183 missing, 0 empty, 0 corrupted: 100%|██████████| 183/183 [00:00<00:00, 207.14it/s] train: WARNING: No masks found in ..\data\LINEMOD\benchvise\train.cache. train: New cache created: ..\data\LINEMOD\benchvise\train.cache train: Caching images (0.2GB): 100%|██████████| 183/183 [00:00<00:00, 1633.99it/s] [ WARN:0@5.214] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity Scanning images: 0%| | 0/1032 [00:00<?, ?it/s][ WARN:0@5.233] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@5.244] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@1.305] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@5.248] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@1.311] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@1.318] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@1.322] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity val: Scanning '..\data\LINEMOD\benchvise\test' for images, labels and masks... 1032 found, 0 missing, 0 empty, 0 corrupted, 0 found, 1032 missing, 0 empty, 0 corrupted: 100%|██████████| 1032/1032 [00:09<00:00, 107.42it/s] val: WARNING: No masks found in ..\data\LINEMOD\benchvise\test.cache. val: New cache created: ..\data\LINEMOD\benchvise\test.cache val: Caching images (1.0GB): 100%|██████████| 1032/1032 [00:00<00:00, 1578.87it/s] autoanchor: Analyzing anchors... anchors/target = 0.00, Best Possible Recall (BPR) = 0.0000. Attempting to improve anchors, please wait... <utils.datasets.LoadImagesAndLabelsPose object at 0x000001E91FAF6340> autoanchor: Running kmeans for 9 anchors on 183 points... autoanchor: thr=0.25: 1.0000 best possible recall, 9.00 anchors past thr autoanchor: n=9, img_size=640, metric_all=0.731/0.925-mean/best, past_thr=0.731-mean: 112,183, 143,149, 151,180, 176,170, 172,202, 149,256, 214,213, 208,266, 256,252 autoanchor: Evolving anchors with Genetic Algorithm: fitness = 0.9264: 100%|██████████| 1000/1000 [00:00<00:00, 4511.37it/s] Image sizes 640 train, 640 test Using 2 dataloader workers Logging results to runs\train\exp19 Starting training for 5000 epochs... autoanchor: thr=0.25: 1.0000 best possible recall, 9.00 anchors past thr autoanchor: n=9, img_size=640, metric_all=0.734/0.926-mean/best, past_thr=0.734-mean: 113,180, 146,152, 151,182, 175,169, 169,197, 151,257, 212,213, 208,261, 255,252 autoanchor: New anchors saved to model. Update model *.yaml to use these anchors in the future. Epoch gpu_mem l_obj l_box l_cls n_targets imgsize 0%| | 0/92 [00:00<?, ?it/s][ WARN:0@26.867] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@26.874] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.174 0.637 0 2 640: 1%| | 1/92 [00:10<16:14, 10.71s/it][ WARN:0@33.638] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@33.644] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.174 0.5903 0 2 640: 2%|▏ | 2/92 [00:18<13:47, 9.19s/it][ WARN:0@45.708] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@45.716] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.174 0.5717 0 2 640: 3%|▎ | 3/92 [00:26<12:34, 8.47s/it][ WARN:0@49.384] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@49.392] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.174 0.5573 0 2 640: 4%|▍ | 4/92 [00:33<11:37, 7.93s/it][ WARN:0@60.411] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@60.419] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.174 0.4981 0 2 640: 5%|▌ | 5/92 [00:40<11:02, 7.62s/it][ WARN:0@63.533] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@63.542] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.173 0.464 0 2 640: 7%|▋ | 6/92 [00:47<10:35, 7.39s/it][ WARN:0@74.415] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@74.422] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.173 0.4414 0 2 640: 8%|▊ | 7/92 [00:54<10:25, 7.36s/it][ WARN:0@77.774] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@77.786] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity 0/4999 0G 2.172 0.4247 0 2 640: 9%|▊ | 8/92 [01:01<10:11, 7.28s/it][ WARN:0@88.819] global loadsave.cpp:248 cv::findDecoder imread(''): can't open/read file: check file path/integrity [ WARN:0@88.830] global loadsave.cpp:248 cv::findDecoder imread_(''): can't open/read file: check file path/integrity do you know what happened ?

cviviers commented 1 month ago

Hi @flashmoment ,

Thanks for the compliments. I will try make the repo easier to implement and maybe add some training tips later on. As for your questions:

Yes, the first Linemod benchmark uses one model per object. However, the current code + few minor modifications supports multi-object pose estimation. I will add more on this later.
As described in the paper, we utilize the camera parameters at validation/test time and we do not assume they are fixed (maybe you zoom or change something about your camera). The easiest way to make linemod compatible with our X-ray experiments was to add the camera params to the label files so we can run all the experiments with one codebase. As such, linemod updated is the same as linemod with the (static) camera params appended to the labels.
Your issue has something to do with the filenames. Here there are a few suggestions.

Hope this helps.

flashmoment commented 1 month ago

Hi @flashmoment ,

Thanks for the compliments. I will try make the repo easier to implement and maybe add some training tips later on. As for your questions:

Yes, the first Linemod benchmark uses one model per object. However, the current code + few minor modifications supports multi-object pose estimation. I will add more on this later.

As described in the paper, we utilize the camera parameters at validation/test time and we do not assume they are fixed (maybe you zoom or change something about your camera). The easiest way to make linemod compatible with our X-ray experiments was to add the camera params to the label files so we can run all the experiments with one codebase. As such, linemod updated is the same as linemod with the (static) camera params appended to the labels.

Your issue has something to do with the filenames. Here there are a few suggestions.

Hope this helps.

"Thank you, I can now run the program smoothly. I also want to know some experimental details, such as how many epochs and what batch size you trained for each object to obtain the best.pt file."

cviviers commented 1 month ago

The weights released were all produced using the same hyperparmeter config file instead of object-specific configs for consistency and reproducibility. They vary slightly from the results in the paper (most better).

I trained using: train.py --batch 58 --epochs 7000 --cfg yolov5xv6_pose_bifpn.yaml --hyp hyp.single.yaml --weights yolov5x.pt --data object.yaml --rect --cache --optimizer Adam

flashmoment commented 1 month ago

The weights released were all produced using the same hyperparmeter config file instead of object-specific configs for consistency and reproducibility. They vary slightly from the results in the paper (most better).

I trained using: train.py --batch 58 --epochs 7000 --cfg yolov5xv6_pose_bifpn.yaml --hyp hyp.single.yaml --weights yolov5x.pt --data object.yaml --rect --cache --optimizer Adam Thank you very much. I'm currently training a model, but I still have some questions. If I want to resume training from where it left off last time, should I use the 'resume' parameter? If so, should I pass the path of 'last.pt' to it?If I continue the training from last time, do I still need to specify the epoch or cfg file in the command line parameters? Could you provide an example of what command I should enter in the terminal to resume my training from the last checkpoint

cviviers commented 1 month ago

Hi @flashmoment,

To resume training you can simply add the --resume flag at the end of the training command. It will find the latest training run, load the config from that run and continue training from wherever the last checkpoint was saved. You can also pass a specific checkpoint from an earlier run with --resume "path/to/checkpoint.pt" and it will continue from there. You do not need to pass any parameters when resuming, it will load it from the training save_dir.

I saw there was a minor issue with the save_dir when running on windows and wanting to resume - I just fixed it.

flashmoment commented 1 month ago

Hi @flashmoment,

To resume training you can simply add the --resume flag at the end of the training command. It will find the latest training run, load the config from that run and continue training from wherever the last checkpoint was saved. You can also pass a specific checkpoint from an earlier run with --resume "path/to/checkpoint.pt" and it will continue from there. You do not need to pass any parameters when resuming, it will load it from the training save_dir.

I saw there was a minor issue with the save_dir when running on windows and wanting to resume - I just fixed it.

i run python train.py --resume but it report: File "/poseproject/yolov56d/YOLOv5-6D-Pose/train.py", line 506, in opt = argparse.Namespace(**yaml.load(f, Loader=yaml.SafeLoader)) # replace File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/init.py", line 81, in load return loader.get_single_data() File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 60, in construct_document for dummy in generator: File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 413, in construct_yaml_map value = self.construct_mapping(node) File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 218, in construct_mapping return super().construct_mapping(node, deep=deep) File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 143, in construct_mapping value = self.construct_object(value_node, deep=deep) File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object data = constructor(self, node) File "/poseproject/yolov56denv/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined raise ConstructorError(None, None, yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:pathlib.PosixPath' in "runs/train/exp9/opt.yaml", line 37, column 11 After looking into the opt.yaml file, I found that the 37th line reads "save_dir: !!python/object/apply:pathlib.PosixPath". What should I do next?

cviviers commented 1 month ago

I just pushed an update on this. Just change the output here from a Path to a string. The save_dir was not being saved correctly. You can also manually fix the save_dir path in the opt.yaml file of your last run, then it will work.

flashmoment commented 1 month ago

I just pushed an update on this. Just change the output here from a Path to a string. The save_dir was not being saved correctly. You can also manually fix the save_dir path in the opt.yaml file of your last run, then it will work.

Thank you very much for your patient explanation. I can now smoothly start my training theory from any checkpoint.

cviviers commented 1 week ago

Windows key?

cviviers / YOLOv5-6D-Pose

I am delighted to see this inspiring project, but I have a few questions. #7