🤔[question] where to set the `find_unused_parameters=True`

Describe your question

Due to I freeze some layer in the network, some parms cannot be update. So, I want to set the parm find_unused_parameters in DistributedDataParallel to solve this problem. here is my yaml:

bind_mounts:
  - host_path: /home/cai/project/determined-0.34.0/examples/object_detection/checkpoints/
    container_path: /mnt/checkpoints/
  - host_path: /home/cai/project/determined-0.34.0/examples/object_detection/dataset/
    container_path: /mnt/dataset/
description: an object detection task
entrypoint: Det:Det
hyperparameters:
  global_batch_size: 16
max_restarts: 0
name: fcos
resources:
  slots_per_trial: 4
scheduling_unit: 1
records_per_epoch: 60000
min_checkpoint_period:
  epochs: 1
min_validation_period:
  epochs: 1
searcher:
  max_length:
    epochs: 12
  metric: mAP
  name: single
  smaller_is_better: false
labels:
- caida

here is my code:

"""
This example shows how to interact with the Determined PyTorch interface to
build a basic object detection task.

In the `__init__` method, the model and optimizer are wrapped with `wrap_model`
and `wrap_optimizer`. This model is single-input and single-output.

The methods `train_batch` and `evaluate_batch` define the forward pass
for training and evaluation respectively.
"""

from typing import Any, Dict, Sequence, List, Union, cast
import torch
from determined.pytorch import DataLoader, PyTorchTrial, PyTorchTrialContext
from model.fcos import FCOSDetector
from dataset.COCO_dataset import COCODataset
from dataset.augment import Transforms
from eval import COCOGenerator,  evaluate_coco

TorchData = Union[Dict[str, torch.Tensor], Sequence[torch.Tensor], torch.Tensor]

class Det(PyTorchTrial):
    def __init__(self, context: PyTorchTrialContext) -> None:
        self.context = context
        self.model = self.context.wrap_model(FCOSDetector(mode="training"))
        self.optimizer = self.context.wrap_optimizer(
            torch.optim.SGD(self.model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0001)
        )
        self.generator = COCOGenerator("/mnt/dataset/dataset/val_images", '/mnt/dataset/annotations/val.json')

    def build_training_data_loader(self) -> DataLoader:
        transform = Transforms()
        train_data = COCODataset("/mnt/dataset/images", '/mnt/dataset/annotations/train.json', transform=transform)
        return DataLoader(train_data, batch_size=self.context.get_per_slot_batch_size(),  collate_fn=train_data.collate_fn, shuffle=True)

    def build_validation_data_loader(self) -> DataLoader:
        transform = Transforms()
        validation_data = COCODataset("/mnt/dataset/val_images", '/mnt/dataset/annotations/val.json', transform=transform)
        return DataLoader(validation_data, batch_size=self.context.get_per_slot_batch_size(), shuffle=False)

    def train_batch(
        self, batch: TorchData, epoch_idx: int, batch_idx: int
    ) -> Dict[str, torch.Tensor]:
        batch = list(batch)
        # imgs, bboxes, classes = batch
        loss = self.model(batch)
        loss = loss[-1].mean()
        self.context.backward(loss)
        self.context.step_optimizer(self.optimizer)

        return {"loss": loss}

    def evaluate_batch(self, batch: TorchData) -> Dict[str, Any]:
        # imgs, bboxes, classes = batch
        batch = list(batch)
        inference_model = FCOSDetector(mode="inference")
        inference_model.load_state_dict(self.model.state_dict())
        validation_loss = self.model(batch)
        validation_loss = validation_loss[-1].mean()
        mAP = evaluate_coco(self.generator, inference_model)[0]

        return {"validation_loss": validation_loss, "mAP": mAP}

Checklist

[X] Did you search the docs for a solution?
[X] Did you search github issues to find if somebody asked this question before?

determined-ai / determined

🤔[question] where to set the `find_unused_parameters=True` #9814

Describe your question

Checklist