OpenGVLab / UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
https://arxiv.org/abs/2211.09552
Apache License 2.0
294 stars 19 forks source link

Can I get example config(.yaml) file to execute local demo? #10

Closed JeonDF closed 1 year ago

JeonDF commented 1 year ago

I want to execute demo of this project.

Installation, Generating pretrained vits weight was performed successfully. And then I downloaded model file(ActivityNet - anet_uniformerv2_l14_32x224.pyth) in model zoo and I made config file based on (ActivityNet - anet_uniformerv2_l14_32x224 config file in model zoo).

I just added or changed following options, and rest options are same as config(.yaml) of anet_uniformerv2_l14_32x224. (Of course, I turned off the comment(#) to run the demo.)

1) ENABLE of Train / Test => False 2) UNIFORMERV2.PRETRAIN - 'k400+k710_uniformerv2_l14_32x224.pyth' => 'anet_uniformerv2_l14_32x224.pyth' 3) Added DEMO options. DEMO: ENABLE: True THREAD_ENABLE: True LABEL_FILE_PATH: "demo/kinetics_classnames.json" INPUT_VIDEO: "demo/videos/test.mp4"

4) Set NUM_GPU to 1 (my device has just one gpu)

And then I executed my run_net.py However, I get following error.

=========================================================================== [12/08 15:59:10][INFO] uniformerv2_model.py: 141: Drop path rate: 0.13333334028720856 [12/08 15:59:10][INFO] uniformerv2_model.py: 141: Drop path rate: 0.2666666507720947 [12/08 15:59:11][INFO] uniformerv2_model.py: 141: Drop path rate: 0.4000000059604645 [12/08 15:59:11][INFO] uniformerv2_model.py: 445: load pretrained weights [12/08 15:59:11][INFO] uniformerv2_model.py: 355: Inflate: conv1.weight, torch.Size([1024, 3, 14, 14]) => torch.Size([1024, 3, 1, 14, 14]) [12/08 15:59:11][INFO] uniformerv2_model.py: 336: Init center: True [12/08 15:59:12][INFO] uniformerv2.py: 68: load model from __demo/models/anet_uniformerv2_l14_32x224.pyth [12/08 15:59:15][INFO] predictor.py: 46: Start loading model weights. [12/08 15:59:15][INFO] checkpoint.py: 520: Unknown way of loading checkpoint. Using with random initialization, only for debugging. [12/08 15:59:15][INFO] predictor.py: 48: Finish loading model weights

Traceback (most recent call last): File "C:.conda\envs\UniFormerV2\lib\site-packages\tqdm\std.py", line 1195, in iter for obj in iterable: File "D:\UniFormerV2\tools\demo_net_mine.py", line 78, in run_demo model.put(task) File "D:\UniFormerV2\slowfast\visualization\predictor.py", line 143, in put task = self.predictor(task) File "D:\UniFormerV2\slowfast\visualization\predictor.py", line 105, in call preds = self.model(inputs, bboxes) File "C:.conda\envs\UniFormerV2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) TypeError: forward() takes 2 positional arguments but 3 were given

===========================================================================

[Error point - UniFormerV2\slowfast\visualization\predictor.py", line 105]

        if self.cfg.DETECTION.ENABLE and not bboxes.shape[0]: 
            preds = torch.tensor([])
        else:
            preds = self.model(inputs, bboxes) ---> [!!!this is the point of error!!!]

I geussed that the error was occurred because I changed UNIFORMERV2.PRETRAIN value. So I downloaded same model as file name written on original config file (k400+k710_uniformerv2_l14_32x224.pyth). Downloaded model file name was "k400_k710_uniformerv2_l14_32x224.pyth" and I set UNIFORMERV2.PRETRAIN value to path of "k400_k710_uniformerv2_l14_32x224.pyth".

And then I executed my run_net.py again. However I get same error. (TypeError: forward() takes 2 positional arguments but 3 were given)

How can I run demo without error? Can anybody give some example config(.yaml) file for local demo?

Andy1621 commented 1 year ago

Thanks for your question. I'm sorry that I never use DEMO of PySlowFast, so I can not answer your question.

For the local demo, I suggest you use HuggingFace, which provides an easy web UI. You can follow my demo to build it by yourself! All you need to do are cloning the code and building a space. If you want to use anet model weight, you need to upload it by yourself. Recently, I also try to merge the code in HuggingFace, but it needs some time...

JeonDF commented 1 year ago

@Andy1621 Thanks for your answer. I referenced HuggingFace demo project. In the demo code, loading model , loading input video and prediction process is written. And the process is operated well, so the result of prediction is printed well.

I can now test the prediction of model that have higher mAP.

Thanks for your answer again.