aau-cns / poet

PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation
Other
65 stars 12 forks source link

Demo script for inference mode #8

Closed ttsesm closed 8 months ago

ttsesm commented 1 year ago

Dear @tgjantos,

Congrats for your work and thanks for releasing the code. I am trying to understand how I could use your work for my project and checking on the documentation it is not clear to me how to use poet for inference with the pre-trained models on images given from my custom dataset.

For example I have a set of images like the ones below: image and for each one of them I would be interested to extract the camera pose using poet thus could you please elaborate a bit how this could be achieved since as I understand the commands that you provide here is for training if I am not wrong.

Thanks.

tgjantos commented 1 year ago

Dear @ttsesm,

thank you for trying out PoET. We released our inference tools and we hope that this allows you to use PoET in inference mode for your custom dataset. The README contains now a description how to run PoET in inference mode.

You just have to set the -inference flag and provide with --inferece_path a path to a single directory containing all the images. PoET processes all the images and stores the predictions in a single JSON-file. The JSON-file is then written to --inference_output. The image id, defined by the number in the image file name, serves as an identification key in the JSON-file.

This should allow you the process your images with PoET. Let me know if it works!

Best, Thomas

ttsesm commented 1 year ago

Thanks @tgjantos for the swift response.

One clarification question which of the pre-trained models should I use from here?

Because from the command that you have in the inference part you use the following parameter --resume /path/to/model/checkpoint0049.pth thus I got a bit puzzled whether I should use a pre-trained model from above or you mean something else.

Also I am trying to use the code normally outside of the docker environment, if that makes any difference.

tgjantos commented 1 year ago

The provided models are pre-trained on the YCB-V or LM-O dataset. Hence, they only work for the objects contained in these datasets. From your example images I see that you have a novel object for which you wish to estimate the relative 6D Pose. Unfortunately, none of the provided pre-trained models will give you accurate estimates. Therefore, you would have to first train PoET on your own custom dataset.

The command I provided assumes that there exists a model trained for your data that you need to load for the inference.

Using the code outside the docker environment should work just as fine.

Hope this helps you! Please do not hesitate, if you have any further questions!

Best, Thomas

ttsesm commented 1 year ago

I see, well I guess that if I already have a pre-trained yolo model on the pieces that I am looking for I could use this in the --resume /path/to/model/checkpoint0049.pth command, right? or I would still need to train the PoET using this as a backbone.

Thanks.

tgjantos commented 1 year ago

If you have a pre-trained YOLO model, you need to train PoET with this model as your backbone. In this case the command is a bit different. --resume loads the whole model (PoET/Transformer + Backbone). However, with --backbone_weights you are able to specifically load weights for your backbone. Furthermore, by setting --lr_backbone to 0.0 you force the backbone weights not to be trained. This will drastically speed up your training process. After the training of PoET you will have a checkpoint of the whole model (Transformer + Backbone), which can be loaded with --resume.

ttsesm commented 1 year ago

Ok, I got it. Thank you for your time.

One more question though, because it is not clear to me. Can you please elaborate a bit how to form the data for a custom dataset. For example lets say that I have a bunch of images like the ones shown in the first message, how should I form the training data for these images so that I pass them to PoET for training. I've tried to have a look on the YCB-V dataset but I got a bit confused since I haven't used these before. My guess is that it should be something like what you describe in this section of the README file. However, is there any tool that I could use to automatize the creation of this annotation.

tgjantos commented 1 year ago

The linked section from the README file describes the general structure of JSON file containing the dataset. You can take a look at this script. This script loads the YCB-V dataset and transforms it in the desired structure. Therefore, it might be a good starting point for you and you just need to adapt it such that your data is loaded correctly.

tgjantos commented 1 year ago

Closed due to inactiviy.

3bsamad commented 1 year ago

If you have a pre-trained YOLO model, you need to train PoET with this model as your backbone. In this case the command is a bit different. --resume loads the whole model (PoET/Transformer + Backbone). However, with --backbone_weights you are able to specifically load weights for your backbone. Furthermore, by setting --lr_backbone to 0.0 you force the backbone weights not to be trained. This will drastically speed up your training process. After the training of PoET you will have a checkpoint of the whole model (Transformer + Backbone), which can be loaded with --resume.

So if I have say a yolov9 model that is pretrained on my specific custom object, do I also have to implement said model in the models/ directory from the beginning and adapt the code to it? Or can I just load the weights and set --lr_backbone to zero?

ttsesm commented 1 year ago

Hi @tgjantos,

Sorry for my delayed response but was a busy period. I was checking on the script that you have pointed me out and I am preparing one for my data.

One question though do I need to resize my images to 640x480 or I can use any size images?

Another question that I have is how the model handles views/angles that hasn't been introduced during the training period. For example if I train the model on views takes as follows: image how the trained model will perform on views from the top or in general to other angles?

tgjantos commented 1 year ago

Hi @3bsamad,

as you stated, you would need to implement the YOLOv9 model in the /models directory and adapt it to return the intermediate feature maps as well as the object detections. We provide an adapted Scaled-YOLOv4 model that is compatible with PoET and returns the necessary outputs. Check out the PoET README to see how to integrate it. Maybe this will help you. Once you have the model implemented, you can load the weights with --backbone_weights and set --lr_backbone to 0.

If you have any questions do not hesitate to open a new issue or contact me.

Best, Thomas

tgjantos commented 1 year ago

Hi @ttsesm,

you can use any image size that you want. It is just limited by your object detector backbone. For example, YOLOv4 requires you to have an image size that is divisible by 32. I definitely ran PoET also with larger images, i.e. 1280x960. Please note, that larger image sizes will lead to larger inference sizes. Furthermore, it has to be an image size that you have trained with. PoET learns the camera intrinsics during training. Therefore, it is also important that you use the same camera, i.e. with the same camera intrinsics and distortion coefficients, to generate the training images and validation images.

To be honest, I have never evaluated the performance of PoET on unseen views. I can imagine that PoET will not be able to correctly infer the 6D Pose correctly. I would suggest that you try to cover the whole viewpoint sphere around the object as best as you can during training to get satisfying performance.

Best, Thomas

ttsesm commented 1 year ago

Hi @ttsesm,

you can use any image size that you want. It is just limited by your object detector backbone. For example, YOLOv4 requires you to have an image size that is divisible by 32. I definitely ran PoET also with larger images, i.e. 1280x960. Please note, that larger image sizes will lead to larger inference sizes. Furthermore, it has to be an image size that you have trained with. PoET learns the camera intrinsics during training. Therefore, it is also important that you use the same camera, i.e. with the same camera intrinsics and distortion coefficients, to generate the training images and validation images.

To be honest, I have never evaluated the performance of PoET on unseen views. I can imagine that PoET will not be able to correctly infer the 6D Pose correctly. I would suggest that you try to cover the whole viewpoint sphere around the object as best as you can during training to get satisfying performance.

Best, Thomas

I see, thanks for the feedback ;-).

3bsamad commented 1 year ago

Hi @3bsamad,

as you stated, you would need to implement the YOLOv9 model in the /models directory and adapt it to return the intermediate feature maps as well as the object detections. We provide an adapted Scaled-YOLOv4 model that is compatible with PoET and returns the necessary outputs. Check out the PoET README to see how to integrate it. Maybe this will help you. Once you have the model implemented, you can load the weights with --backbone_weights and set --lr_backbone to 0.

If you have any questions do not hesitate to open a new issue or contact me.

Best, Thomas

Hi @tgjantos, Many thanks for your reply. I am looking into the adapted scaled yolov4 right now. Should I just copy yolo/backbone_models/yolo.py into /models and adjust backbone.py to load/ build yolo?

tgjantos commented 1 year ago

Hi @3bsamad,

you have to copy the whole repository into the /models directory. Then you will have, e.g., /models/cns_yolo.py and /models/yolo/. Afterwards you have to adapt backbone.py to correctly load the model. You can take a look at build_cns_yolo in line 138 in cns_yolo.py. Make sure that you also load the correct config file. We provide an example file.

Hope this helps you! Let me know if you encounter any problems.

Best, Thomas

Laidawang commented 1 year ago

@tgjantos hi, when i try to use yolov4, i got error :"size mismatch for translation_head.0.layers.2.weight: copying a param with shape torch.Size([66, 256]) from checkpoint, the shape in current model is torch.Size([3, 256])." Can you help me identify what is wrong, I think I have used weights and config file correctly.

tgjantos commented 1 year ago

Dear @Laidawang,

this is a problem of the PoET configs not being set correctly. Please check out the hyperparameter file. This file lists the hyperparameters used for this PoET model. You need to set the correct runtime arguments correctly. The error you are facing currently can be solved by setting the --class_mode to "specific" when executing the code.

Best, Thomas

Laidawang commented 1 year ago

@tgjantos ,i appreciate your help.

image

Here is my config, I should have set the class_mode correctly. Can you tell me where to specify this hyperparameter file? I don't seem to be loading correctly.

I know, I need the parameters corresponding to yaml to replace the startup items

Laidawang commented 1 year ago

Hi @tgjantos, When I try to use a picture I took myself, I get errors.

Traceback (most recent call last): File "main.py", line 391, in <module> inference(args) File "/data2/home/srchen/project/github/in_work/poet/inference_tools/inference_engine.py", line 54, in inference outputs, n_boxes_per_sample = model(samples, targets) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/data2/home/srchen/project/github/in_work/poet/models/pose_estimation_transformer.py", line 296, in forward raise NotImplementedError("PoET Bounding Box Mode not implemented!") NotImplementedError: PoET Bounding Box Mode not implemented!

this is my launch code. python main.py --enc_layers 5 --dec_layers 5 --nheads 16 --inference --grayscale --rgb_augmentation --lr_backbone 0.0

i think it has load the data correctly. image

tgjantos commented 1 year ago

Dear @Laidawang,

the YAML file serves as a quick overview for which parameters were used for this specific model. You would need to to extend the run command for this specific run time argument. E.G. python main.py --enc_layers 5 --dec_layers 5 --nheads 16 --inference --grayscale --rgb_augmentation --lr_backbone 0.0 --class_mode specific.

For your second issue: it seems that the Bounding Box Mode is not loaded correctly from the default parameters. Could you try to add the run time argument: --bbox_mode backbone

Best, Thomas

Laidawang commented 1 year ago

yes, it works but I got no result. this is the image i’m using. image

Can you provide a picture for inference, to make sure I get the correct result. thank you for your help!

tgjantos commented 1 year ago

You are trying to use PoET on a custom dataset. Therefore you will need to train PoET on your own data before doing inference. The model that we provide online is pre-trained for the YCBV dataset and thus only works for the objects contained in this dataset. Example images of the YCBV dataset can be downloaded here: https://bop.felk.cvut.cz/datasets/

Best, Thomas

tgjantos commented 8 months ago

Closed due to inactivity. Feel free to open the issue again.