albumentations-team / autoalbument

AutoML for image augmentation. AutoAlbument uses the Faster AutoAugment algorithm to find optimal augmentation policies. Documentation - https://albumentations.ai/docs/autoalbument/
https://albumentations.ai/docs/autoalbument/
MIT License
203 stars 20 forks source link

Error calling Module: dataset.SearchDataset #44

Open TEnsorTHiru opened 2 years ago

TEnsorTHiru commented 2 years ago

I get this error: Error calling module: dataset.SearchDataset. I have attached the stacktrace for the error and also the yaml and dataset pyscript.

Stack Trace

Traceback (most recent call last):
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/utils.py", line 529, in _locate
    module = import_module(mod)
  File "/opt/conda/envs/auto/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1011, in _gcd_import
  File "<frozen importlib._bootstrap>", line 950, in _sanity_check
ValueError: Empty module name

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/utils.py", line 61, in call
    type_or_callable = _locate(cls)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/utils.py", line 532, in _locate
    raise ImportError(f"Error loading module '{path}'") from e
ImportError: Error loading module 'dataset.SearchDataset'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/auto/bin/autoalbument-search", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/core/utils.py", line 127, in run_job
    ret.return_value = task_function(task_cfg)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/autoalbument/cli/search.py", line 55, in main
    searcher.search()
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/autoalbument/faster_autoaugment/search.py", line 65, in search
    self.trainer.fit(self.model, datamodule=self.datamodule)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 488, in fit
    self.data_connector.prepare_data(model)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 59, in prepare_data
    self.trainer.datamodule.prepare_data()
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 40, in wrapped_fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/autoalbument/faster_autoaugment/datamodule.py", line 23, in prepare_data
    self._instantiate_dataset()
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/autoalbument/faster_autoaugment/datamodule.py", line 73, in _instantiate_dataset
    dataset = instantiate(data_cfg.dataset, transform=transform)
  File "/opt/conda/envs/auto/lib/python3.8/site-packages/hydra/utils.py", line 70, in call
    raise HydraException(f"Error calling '{cls}' : {e}") from e
hydra.errors.HydraException: Error calling 'dataset.SearchDataset' : Error loading module 'dataset.SearchDataset'

Search.yaml

# @package _global_

_version: 2  # An internal value that indicates a version of the config schema. This value is used by
# `autoalbument-search` and `autoalbument-migrate` to upgrade the config to the latest version if necessary.
# Please do not change it manually.

seed: 42 # Random seed. If the value is not null, it will be passed to `seed_everything` -
# https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.utilities.seed.html?highlight=seed_everything

task: classification # Deep learning task. Should either be `classification` or `semantic_segmentation`.

policy_model:
# Configuration for Policy model which is used to augment input images.

  task_factor: 0.1
# Multiplier for classification loss of a model. Faster AutoAugment uses classification loss to prevent augmentations
# from transforming images of a particular class to another class. The authors of Faster AutoAugment use 0.1 as
# default value.

  gp_factor: 10
# Multiplier for the gradient penalty for WGAN-GP training. 10 is the default value that was proposed in
# `Improved Training of Wasserstein GANs`.

  temperature: 0.05
# Temperature for Relaxed Bernoulli distribution. The probability of applying a certain augmentation is sampled from
# Relaxed Bernoulli distribution (because Bernoulli distribution is not differentiable). With lower values of
# `temperature` Relaxed Bernoulli distribution behaves like Bernoulli distribution. In the paper, the authors
# of Faster AutoAugment used 0.05 as a default value for `temperature`.

  num_sub_policies: 40
# Number of augmentation sub-policies. When an image passes through an augmentation pipeline, Faster AutoAugment
# randomly chooses one sub-policy and uses augmentations from that sub-policy to transform an input image. A larger
# number of sub-policies leads to a more diverse set of augmentations and better performance of a model trained on
# augmented images. However, an increase in the number of sub-policies leads to the exponential growth of a search
# space of augmentations, so you need more training data for Policy Model to find good augmentation policies.

  num_chunks: 4
# Number of chunks in a batch. Faster AutoAugment splits each batch of images into `num_chunks` chunks. Then it
# applies the same sub-policy with the same parameters to each image in a chunk. This parameter controls the tradeoff
# between the speed of augmentation search and diversity of augmentations. Larger `num_chunks` values will lead to
# faster searching but less diverse set of augmentations. Note that this parameter is used only in the searching
# phase. When you train a model with found sub-policies, Albumentations will apply a distinct set of transformations
# to each image separately.

  operation_count: 4
# Number of consecutive augmentations in each sub-policy. Faster AutoAugment will sequentially apply `operation_count`
# augmentations from a sub-policy to an image. Larger values of `operation_count` lead to better performance of
# a model trained on augmented images. Simultaneously, larger values of `operation_count` affect the speed of search
# and increase the searching time.

#   operations:
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.ShiftRGB
#     shift_r: true
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.ShiftRGB
#     shift_g: true
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.ShiftRGB
#     shift_b: true
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.RandomBrightness
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.RandomContrast
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.Solarize
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.HorizontalFlip
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.VerticalFlip
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.Rotate
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.ShiftX
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.ShiftY
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.Scale
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.CutoutFixedNumberOfHoles
#   - _target_: autoalbument.faster_autoaugment.models.policy_operations.CutoutFixedSize
#   # A list of augmentation operations that will be applied to input data.

classification_model:
# Settings for Classification Model that is used for two purposes:
# 1. As a model that performs classification of input images.
# 2. As a Discriminator for Policy Model.

  _target_: autoalbument.faster_autoaugment.models.ClassificationModel
# Python class for instantiating Classification Model. You can read more about overriding this value
# to use a custom model at https://albumentations.ai/docs/autoalbument/custom_model/

  num_classes: 27
# Number of classes in the dataset. The dataset implementation should return an integer in the range
# [0, num_classes - 1] as a class label of an image.

  architecture: tf_efficientnetv2_s
# Architecture of Classification Model. The default implementation of Classification model in AutoAlbument uses
# models from https://github.com/rwightman/pytorch-image-models/. Please refer to its documentation to get a list of
# available models - https://rwightman.github.io/pytorch-image-models/#list-models-with-pretrained-weights.

  pretrained: true
# Boolean flag that indicates whether the selected model architecture should load pretrained weights or use randomly
# initialized weights.

data:
  #dataset:
  #  _target_: dataset.SearchDataset
  dataset_file: /home/jupyter/thiru/auto/dataset.py
  # Class for instantiating a PyTorch dataset.

  input_dtype: uint8
# The data type of input images. Two values are supported:
# - uint8. In that case, all input images should be NumPy arrays with the np.uint8 data type and values in the range
#   [0, 255].
# - float32. In that case, all input images should be NumPy arrays with the np.float32 data type and values in the
#   range [0.0, 1.0].

  preprocessing: null
# A list of preprocessing augmentations that will be applied to each image before applying augmentations from
# a policy. A preprocessing augmentation should be defined as `key`: `value`, where `key` is the name of augmentation
# from Albumentations, and `value` is a dictionary with augmentation parameters. The found policy will also apply
# those preprocessing augmentations before applying the main augmentations.
#
# Here is an example of an augmentation pipeline that first pads an image to the size 512x512 pixels, then resizes
# the resulting image to the size 256x256 pixels and finally crops a random patch with the size 224x224 pixels.
#
#  preprocessing:
#    - PadIfNeeded:
#        min_height: 512
#        min_width: 512
#    - Resize:
#        height: 256
#        width: 256
#    - RandomCrop:
#        height: 224
#        width: 224
#

  normalization:
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
# Normalization values for images. For each image, the search pipeline will subtract `mean` and divide by `std`.
# Normalization is applied after transforms defined in `preprocessing`. Note that regardless of `input_dtype`,
# the normalization function will always receive a `float32` input with values in the range [0.0, 1.0], so you should
# define `mean` and `std` values accordingly. ImageNet normalization is used by default.

  dataloader:
    _target_: torch.utils.data.DataLoader
    batch_size: 16
    shuffle: true
    num_workers: 8
    pin_memory: true
    drop_last: true
# Parameters for the PyTorch DataLoader. Please refer to the PyTorch documentation for the description of parameters -
# https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader.

searcher:
  _target_: autoalbument.faster_autoaugment.search.FasterAutoAugmentSearcher
# Class for Searcher that is used to discover augmentation policies. You can create your own Searcher to alter
# the behavior of AutoAlbument.

trainer:
  _target_: pytorch_lightning.Trainer
# Configuration for PyTorch Lightning Trainer. You can read more about Trainer and its arguments at
# https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html.

  gpus: 1
# Number of GPUs to train on. Set to `0` or None` to use CPU for training.
# More detailed description - https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#gpus

  benchmark: true
# If true enables cudnn.benchmark.
# More detailed description - https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#benchmark

  max_epochs: 20
# Number of epochs to search for augmentation parameters.
# More detailed description - https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#max-epochs

  resume_from_checkpoint: null
# Path to a checkpoint to resume training from it. More detailed description -
# https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#resume-from-checkpoint

optim:
  main:
  # Optimizer configuration for the main (either Classification or Semantic Segmentation) Model
    _target_: torch.optim.Adam
    lr: 1e-3
    betas: [0, 0.999]

  policy:
  # Optimizer configuration for Policy Model
    _target_: torch.optim.Adam
    lr: 1e-3
    betas: [0, 0.999]

callbacks:
# A list of PyTorch Lightning callbacks. Documentation on callbacks is available at
# https://pytorch-lightning.readthedocs.io/en/stable/extensions/callbacks.html

- _target_: autoalbument.callbacks.MonitorAverageParameterChange
# Prints the "Average Parameter Change" metric at the end of each epoch.
# Read more about this metric at https://albumentations.ai/docs/autoalbument/metrics/#average-parameter-change

- _target_: autoalbument.callbacks.SavePolicy
# Saves augmentation policies at the end of each epoch. You can load saved policies with Albumentations to create
# an augmentation pipeline.

- _target_: pytorch_lightning.callbacks.ModelCheckpoint
  save_last: true
  dirpath: checkpoints
# Saves a checkpoint at the end of each epoch. The checkpoint will contain all the necessary data to resume training.
# More information about this checkpoint -
# https://pytorch-lightning.readthedocs.io/en/latest/extensions/generated/pytorch_lightning.callbacks.ModelCheckpoint.html

logger:
# Configuration for a PyTorch Lightning logger.
# You can read more about loggers at https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html
# By default, TensorBoardLogger is used.

  _target_: pytorch_lightning.loggers.TensorBoardLogger
  save_dir: ${config_dir:}/outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}/tensorboard_logs

hydra:
  run:
    dir: ${config_dir:}/outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
  # Path to the directory that will contain all outputs produced by the search algorithm. `${config_dir:}` contains
  # path to the directory with the `search.yaml` config file. Please refer to the Hydra documentation for more
  # information - https://hydra.cc/docs/configure_hydra/workdir.

Dataset.py

import torch.utils.data
import pandas as pd
import cv2

class SearchDataset(torch.utils.data.Dataset):

    def __init__(self, transform=None):
        self.csv = pd.read_csv("folds.csv")
        self.transform = transform
        # Implement additional initialization logic if needed

    def __len__(self):
        # Replace `...` with the actual implementation
        return len(self.csv)

    def __getitem__(self, index):
        # Implement logic to get an image and its label using the received index.
        #
        # `image` should be a NumPy array with the shape [height, width, num_channels].
        # If an image contains three color channels, it should use an RGB color scheme.
        #
        # `label` should be an integer in the range [0, model.num_classes - 1] where `model.num_classes`
        # is a value set in the `search.yaml` file.

        image = cv2.imread(self.csv.loc[index, "paths"])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        label = self.csv.loc[index, "cind"]

        if self.transform is not None:
            transformed = self.transform(image=image)
            image = transformed["image"]

        return image, label
danielmatwicki commented 1 year ago

Hi, did you figure out how to fix the error? I'm facing the same problem

saigontrade88 commented 9 months ago

There might be some syntax errors. You need set the value of HYDRA_FULL_ERROR variable to 1 using this command 'export HYDRA_FULL_ERROR=1'.