facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.26k stars 334 forks source link

No module named 'vissl.optimizers.lars' #324

Closed FloCF closed 3 years ago

FloCF commented 3 years ago

Congrats and many thanks for this awesome repo!

Trying to load Barlow Twins weights which yields No module named 'vissl.optimizers.lars' error.

########################################
## code from tutorial: Feature Extraction.ipynb ###
########################################
# Install: PyTorch (we assume 1.5.1 but VISSL works with all PyTorch versions >=1.4)
!pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install opencv
!pip install opencv-python

# install apex by checking system settings: cuda version, pytorch version, python version
import sys
import torch
version_str="".join([
    f"py3{sys.version_info.minor}_cu",
    torch.version.cuda.replace(".",""),
    f"_pyt{torch.__version__[0:5:2]}"
])
print(version_str)

# install apex (pre-compiled with optimizer C++ extensions and CUDA kernels)
!pip install -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/{version_str}/download.html apex

# install VISSL
!pip install vissl
########################################

# Download model
!wget "https://dl.fbaipublicfiles.com/vissl/model_zoo/barlow_twins/barlow_twins_32gpus_4node_imagenet1k_1000ep_resnet50.torch"

import vissl
import apex
import torch

# Load Barlow Twins weights
barlow_twins = torch.load('barlow_twins_32gpus_4node_imagenet1k_1000ep_resnet50.torch')

Yields the following error:

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, **pickle_load_args)
    850     unpickler.persistent_load = persistent_load
    851     result = unpickler.load()
--> 852 
    853     torch._utils._validate_loaded_sparse_tensors()
    854 

ModuleNotFoundError: No module named 'vissl.optimizers.lars'
QuentinDuval commented 3 years ago

Hi @FloCF,

It might be coming from the fact that we did not release a new version of VISSL containing the commit of barlow twins: f63a0cee9e3b0c3ed826356210415b6db0be833c, so the pip install vissl will not install the vissl.optimizers.lars.

However, I am surprised that we would be pickling some actual modules and not just weights.

Could you please try on you side loading the checkpoint with the installation from source (https://github.com/facebookresearch/vissl/blob/master/INSTALL.md#Install-from-source-in-PIP-environment)?

On our side, we need to verify why we are pickling modules in the checkpoints as this is not a good pattern.

CC: @prigoyal

prigoyal commented 3 years ago

Hi @FloCF , thank you so much for reaching out about this.

It indeed seems like this model checkpoint is somehow requires the vissl.optimizers.lars which shouldn't be the case. I can repro the issue as well. Assigning to @jingli9111 to help look into this :)

FloCF commented 3 years ago

Hi @QuentinDuval ,

I tried installation from source in colab with the following code:

!pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html

!pip install -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/py37_cu101_pyt171/download.html apex

# clone vissl repository
!git clone --recursive https://github.com/facebookresearch/vissl.git

# install vissl dependencies
!pip install --progress-bar off -r vissl/requirements.txt
!pip install opencv-python
# update classy vision install to current master
!pip uninstall -y classy_vision
!pip install classy-vision@https://github.com/facebookresearch/ClassyVision/tarball/master
# install vissl dev mode (e stands for editable)
!cd vissl && pip install -e ".[dev]"

# Download model
!wget "https://dl.fbaipublicfiles.com/vissl/model_zoo/barlow_twins/barlow_twins_32gpus_4node_imagenet1k_1000ep_resnet50.torch"

import vissl
import apex
import torch

# Load Barlow Twins weights
barlow_twins = torch.load('barlow_twins_32gpus_4node_imagenet1k_1000ep_resnet50.torch')

This time I got the following error;

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
    851     unpickler = pickle_module.Unpickler(data_file, **pickle_load_args)
    852     unpickler.persistent_load = persistent_load
--> 853     result = unpickler.load()
    854 
    855     torch._utils._validate_loaded_sparse_tensors()

ModuleNotFoundError: No module named 'vissl.optimizers'

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

I guess this is just a minor issue with the way you saved the Barlow Twins checkpoint. Nevertheless, vissl is great and I am amazed by all the incredible work in SSL coming from FAIR!

jingli9111 commented 3 years ago

Looks like the bug comes from that barlow twins checkpoint contains the optimizer state and barlow_twins['classy_state_dict']['optimizer']['optim']['param_groups'] has an attribute exclude': <function _LARS._exclude_bias_and_norm at 0x7f98dd4b4f80>

This is an optional function for LARS to exclude bias and norms in BN. This part of the code was simply following here: https://github.com/facebookresearch/barlowtwins/blob/e6f34a01c0cde6f05da6f431ef8a577b42e94e71/main.py#L228

The solution should be writing this attribute as boolean instead of feeding a function.

iseessel commented 3 years ago

This should be closed by: https://github.com/facebookresearch/vissl/commit/43f230cd05a700426e21b7b79cb018d97198f370