manan-cashfree commented 1 year ago

🐛 Bug

Description

Search path issues during instantiation.

To reproduce

Minimal Code/Config snippet to reproduce directory structure:

├── src
│   ├── data
│   │   ├── components
│   │   │   ├── transforms.py
│   │   │   └── __init__.py
│   │   ├── __init__.py
│   │   ├── documents_datamodule.py
├── configs
│   ├── __init__.py
├── paths
│   │   └── default.yaml
├── data
│   │   ├── document.yaml

# __init__.py file of components import both train and val transforms

...
from components import train_transforms, val_transforms

class DocumentsDataModule(LightningDataModule):
    def __init__(
            self,
            data_dir: str = "data/",
            train_val_test_split_ratio: tuple = (0.8, 0.2),
            batch_size: int = 8,
            sampler: str = "random",
            num_workers: int = 0,
            pin_memory: bool = False,
    ) -> None:

        super().__init__()
        self.save_hyperparameters(logger=False)
        self.data_train: Optional[ImageFolder] = None
        self.data_val: Optional[ImageFolder] = None
        self.data_test: Optional[ImageFolder] = None
        self.data_predict: Optional[ImageFolder] = None
        self.batch_size_per_device = batch_size
        self.train_transforms = train_transforms
        self.val_transforms = val_transforms

@hydra.main(version_base="1.3", config_path="../configs", config_name="train.yaml")
def view_model(cfg: DictConfig):
    datamodule: LightningDataModule = hydra.utils.instantiate(cfg.data)

Hydra config:

_target_: src.data.documents_datamodule.DocumentsDataModule
_convert_: all
data_dir: ${paths.data_dir}
batch_size: 8 # Needs to be divisible by the number of devices (e.g., if in a distributed setup)
sampler: "random" # imbalanced or random
train_val_test_split_ratio: [0.8, 0.2] # if length is 2 then only train, val splits are created
num_workers: 7 # set according to cpu cores
pin_memory: False

src and all paths have been correctly configured. No issues there. But as soon as I instantiate, there is some weird path issue. This doesn't happen otherwise.

Stack trace/error message

hydra.errors.InstantiationException: Error locating target 'src.data.documents_datamodule.DocumentsDataModule'

Expected Behavior

Imports should be correctly handled. Upon removing the import train_transforms, I am able to initialize.

System information

Hydra Version : 1.3
Python version : 3.10.12
Virtual environment type and version : conda
Operating system : Macos

Additional context

Hydra sucks in a lot of stuff. Or perhaps it is too complex for me. I just tried using a pytorch lightning template and found that it doesn't handle such basic stuff.

odelalleau commented 1 year ago

Can you please make sure your directory structure is correct? For instance I don't see train_transforms.py in it. I also don't see any __init__.py under src which normally should prevent any instantiation from that folder since it's not a proper Python module.

cs-mshah commented 1 year ago

Finally fixed the issue. In the __init__.py file of components, it should be from .transforms import *. Thanks for the support.