Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.48k stars 627 forks source link

[PointPillars] Missing .prototxt file after compilation #300

Closed rafaelnevoa closed 3 years ago

rafaelnevoa commented 3 years ago

I'm trying to run test_bin_pointpillars on the ZCU102 but it aborts because of a missing .prototxt file.

$ ./test_bin_pointpillars test_model0 test_model1 007463.bin 007463.png

WARNING: Logging before InitGoogleLogging() is written to STDERR F1221 04:59:20.597560 1742 configurable_dpu_task_imp.cpp:115] cannot find /usr/share/vitis_ai_library/models/test_model0/test_model0.prototxt Check failure stack trace: Aborted

The models you provide (pointpillars_kitti_12000_0_pt and pointpillars_kitti_12000_1_pt) have 4 files each inside, an md5sum.txt, the .xmodel, one pointpillars_kitti_12000_0_pt_officialcfg.prototxt (which from what I can tell is the same as the .proto and config files used for training) and a pointpillars_kitti_12000_0_pt.prototxt file. I am able to run the test_bin_pointpillars fine using these models, but after compiling the model I trained there are only 3 output files, the .xmodel, a meta.json, and the md5sum.txt.

I presume that for the configurations .prototxt I can just copy a .config file and change the extension, but I can't figure out how to generate the other .prototxt file. I can't find anything useful to solve this in the Vitis-AI user guide, as the only references to .prototxt files are for Caffe implementations, not for PyTorch.

Thanks in advance

lishixlnx commented 3 years ago

There are 2 .prototxt config files. For the one without "_officialcfg", you can just copy the original pointpillars_kitti_12000_0_pt.prototxt and rename it to "test_model0.prototxt", and change the name and kernel name in it. For the one with "_officialcfg", use your meta.json and rename it to "test_model0_officialcfg.prototxt".

rafaelnevoa commented 3 years ago

There are 2 .prototxt config files. For the one without "_officialcfg", you can just copy the original pointpillars_kitti_12000_0_pt.prototxt and rename it to "test_model0.prototxt", and change the name and kernel name in it. For the one with "_officialcfg", use your meta.json and rename it to "test_model0_officialcfg.prototxt".

Thank you for your answer but it's only partially working. With the changes you suggested, running test_bin_pointpillarsl fails with the following output:

parse error for tensorflow offical config file: /usr/share/vitis_ai_library/models/test_model_0/test_model_0_officialcfg.prototxt

I also tried using the pipeline.config from the training stage as the "_officialcfg.prototxt". It works for the model trained with the same configurations as the one provided by xilinx, but for models with different configurations I'm getting "segmentation fault". I tried running one trained instance with very slight changes and I still get "segmentation fault". Is this a problem related to using the same prototxt? Inside the original prototxt file there are 3 mean values, aren't these specific to the quantized model's configurations?

lishixlnx commented 3 years ago
  1. please attach your "_officialcfg.prototxt" file here.
  2. for the 3 mean values in "prototxt" file, it's no use here (all config items are in "_officialcfg.prototxt")
rafaelnevoa commented 3 years ago
  1. please attach your "_officialcfg.prototxt" file here.
  2. for the 3 mean values in "prototxt" file, it's no use here (all config items are in "_officialcfg.prototxt")

Here is the "_officialcfg.prototxt" file saved as a .txt.

test_model_0_officialcfg.txt

lishixlnx commented 3 years ago

compare your cfg file with original one, you can find the difference:

yours: anchor_generator_range original: anchor_generator_stride

yours: anchor_ranges original: strides offsets

please carefully change your config file with correct cfg item name and value, then try again.

rafaelnevoa commented 3 years ago

compare your cfg file with original one, you can find the difference:

yours: anchor_generator_range original: anchor_generator_stride

yours: anchor_ranges original: strides offsets

please carefully change your config file with correct cfg item name and value, then try again.

Those changes are intentional, the compiled model is based on those training configurations. The quantization and compilation had no problems. If I use the default configurations they will be different from the model being used. Using the default configurations with their respective models works fine, but since this custom training model was successfully quantized and compiled (evaluation on the docker container had no problems) it should also work on the ZCU102, or am I missing something?. Are there any specific changes I can make for running this model? Are there any steps either in the quantization or compilation process that are based on the default configuration and won't work otherwise?

lishixlnx commented 3 years ago

the deploy code is bound to the model tightly. Since the model is trained with different configuration, you need make corresponding change in the deploy code. I think it's mainly in the anchor generator part: ./pointpillars/src/postprocess/anchor.cpp

Please check ./pointpillars/include/second/protos/anchors.proto with your files, which also need be replaced with yours.

rafaelnevoa commented 3 years ago

the deploy code is bound to the model tightly. Since the model is trained with different configuration, you need make corresponding change in the deploy code. I think it's mainly in the anchor generator part: ./pointpillars/src/postprocess/anchor.cpp

Please check ./pointpillars/include/second/protos/anchors.proto with your files, which also need be replaced with yours.

That seems to be the problem. Thank you very much for taking the time to help. I hadn't noticed the anchor range wasn't implemented in cpp, though it is on the anchors.proto. After adding the code for the anchor range, what would be the steps to run it on the fpga after the changes?

lishixlnx commented 3 years ago

I'm not sure if the anchors.proto is the only different file in all the ".proto" files. You'd better:

  1. replace all the .proto files with yours. if there are new file, you need update the CMakeLists.txt
  2. please check if the anchors generated by c++ are same as python
  3. then you can test your model with c++ deployed code.
rafaelnevoa commented 3 years ago

I'm not sure if the anchors.proto is the only different file in all the ".proto" files. You'd better:

  1. replace all the .proto files with yours. if there are new file, you need update the CMakeLists.txt
  2. please check if the anchors generated by c++ are same as python
  3. then you can test your model with c++ deployed code.

That solved it. Thank you very much for the help. I just have one last question. When I run the test_performance code on the ZCU102 with the xmodels you provide, the model runs at 18 FPS, but when using a model I trained with the exact same config file and source code the performance is only 10 FPS. My question is, do you perform any additional optimization on the model?

lishixlnx commented 3 years ago

test with one same data by test_bin_pointpillars, add the environment as below:

env DEEPHI_PROFILING=1 ./test_bin_pointpillars ....(other parameters)

you can see the profiling of each step.
then check below items to see which cause the biggest difference.

pp_pre : pp_dpu0 : pp_middle pp_dpu1 : pp_post : pp_total :

rafaelnevoa commented 3 years ago

test with one same data by test_bin_pointpillars, add the environment as below:

env DEEPHI_PROFILING=1 ./test_bin_pointpillars ....(other parameters)

you can see the profiling of each step. then check below items to see which cause the biggest difference.

pp_pre : pp_dpu0 : pp_middle pp_dpu1 : pp_post : pp_total :

I was able to fix the issue with this. Thank you for the help.

littlemww commented 3 years ago

有2.原型配置文件。对于没有"_officialcfg"的,只需复制原始pointpillars_kitti_12000_0_pt.prototxt,并将其重命名为"test_model0.原毒",并更改其中的名称和内核名称。 对于具有"_officialcfg"的,请使用元.json将其重命名为"test_model0_officialcfg.原毒"。

There are 2 .prototxt config files. For the one without "_officialcfg", you can just copy the original pointpillars_kitti_12000_0_pt.prototxt and rename it to "test_model0.prototxt", and change the name and kernel name in it. For the one with "_officialcfg", use your meta.json and rename it to "test_model0_officialcfg.prototxt".

How to use the meta.json and rename the ".prototxt"file?This meta.json file is generated at compile,and I find it doesn't have the information of config file image So I have the same question: cannot parse config file. config_file=/usr/share/vitis_ai_library/models/yolov4_person/yolov4_person.prototxt How can I generate the correct meta.json files?

Thanks in advance