dannecrot / LearnableOSG

Implementation of our paper Learnable Optimal Sequential Grouping for Video Scene Detection
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Plenty of errors while trying to make it work #1

Open netcorefan1 opened 1 year ago

netcorefan1 commented 1 year ago

Hello, I would like to see how this project compare with TransNetV2, but I had to give up because the errors are so many to a point that I'm unable to fix them anymore. Tried with Python 3.10 (and anaconda 3.9.2). python osg_vsd_train.py: ImportError: cannot import name 'container_abcs' from 'torch._six' I fixed this by replacing from torch._six import container_abcs, string_classes, int_classes with

import collections.abc as container_abcs
from torch._six import string_classes

and elif isinstance(elem, int_classes): with elif isinstance(elem, int):

Then I got: RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match. Not sure if I have done the right changes, but I also managed to find a workaround by replacing storage = elem.storage()._new_shared(numel) with storage = elem.storage()._new_shared(numel, device=torch.device("cuda"))

Finally I have been able to start the training, but (probably before completion) I got:

Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:233.)
  return torch.tensor(the_file['x'], dtype=torch.float, device=self.device), torch.tensor(the_file['t'], dtype=torch.float, device=self.device)
Traceback (most recent call last):
  File "C:\Users\user\Downloads\LearnableOSG\osg_vsd_train.py", line 89, in <module>
    CLossTest(num_iters=5)
  File "C:\Users\user\Downloads\LearnableOSG\osg_vsd_train.py", line 75, in CLossTest
    D_temp = OSG_model.module.DIST_FUNC(x_orig.unsqueeze(0))
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1269, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'OSG_C' object has no attribute 'module'. Did you mean: 'modules'?

I have no idea on how to fix this. Error popup after around 30 minutes of training and progress are not cached. Therefore I can't even attempt to play with code because I would have to wait so much time for each minor change.

Any help would be appreciated. Many thanks

dannecrot commented 1 year ago

Hi @netcorefan1 ,

Thanks for taking an interest in our work!

I'll point out first of all, that TransNetV2 performs Shot Boundary Detection (grouping together frames which were taken from the same camera at the same time), while this repo is intended for Video Scene Detection (grouping shots together which depict a particular story element or high level concept). VSD assumes that the video is already divided into shots.

If you're still interested in this repo:

The repo is bare-boned and was intended to be integrated into your own work (importing the OSG_C as a model in a pipeline). If you're planning to run it as-is, then adding periodic torch.save for the model and optimizer states, adding logging, setting up train vs eval data, would definitely be useful.

netcorefan1 commented 1 year ago

Many thanks for your response and sorry for my delay. Thanks for your detailed explanation. If I have understood well, it sounds like they both could complement each other and this sound very interesting. I followed your suggestion and installed Anaconda which provide the same python version. However, before doing this I performed some system upgrades which include my Nvidia card drivers and related SDK toolsets and during the compilation of your project I got an error with PyTorch related to my card. I'm on Cuda 12 and the latest cuda version supported by PyTorch is 11.7.. I suppose this is the source of the problem. I will have to compile the whole torch stuff myself and seem pretty problematic. As soon as I manage to complete this task I will post my results.

I plan to integrate this into my own project by removing any Python stuff and load the model directly in OpenCv. However, I'm afraid I still need a working LearnableOSG running on python for prototyping and translate the relevant code. If you have some suggestion, please let me know.