`ExponentialMovingAverage` processes these parameters whose `requires_grad=False`

Zessay commented 2 years ago

Checklist

[x] I have verified that the issue exists against the main branch of AllenNLP.
[x] I have read the relevant section in the contribution guide on reporting bugs.
[x] I have checked the issues list for similar or identical bug reports.
[x] I have checked the pull requests list for existing proposed fixes.
[x] I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
[x] I have included in the "Description" section below a traceback from any exceptions related to this bug.
[x] I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
[x] I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
[x] I have included in the "Environment" section below the output of pip freeze.
[x] I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

I defined a parameter field_p whose requires_grad is False in my model, and use moving_average in the trainer. Actually, I think the parameter field_p doesn't need to moving average in the training procedure, but the apply method of ExponentialMovingAverage doesn't check the requires_grad property and apply to all parameters.

def apply(self):
        ....
        if num_updates is not None:
            decay = min(
                self._decay, (self._numerator + num_updates) / (self._denominator + num_updates)
            )
        else:
            decay = self._decay

        for name, parameter in self._parameters:
            self._shadows[name].mul_(decay).add_((1 - decay) * parameter.data)

If the dtype of field_p is torch.long, this will raise RuntimeError.

result type Float can't be cast to the desired output type Long
  File "[/xxxtests/train_local.py]()", line 139, in apply
    self._shadows[name].mul_(decay).add_((1 - decay) * parameter.data)
  File "[/xxx/ctr/trainer.py]()", line 216, in _train_epoch
    self._moving_average.apply(self._total_batches_completed + 1)
  File "[/xxx/train_local.py]()", line 249, in train_pipeline
    trainer.train()
  File "[/xxx/tests/train_local.py]()", line 254, in <module>
    train_pipeline()

Related issues or possible duplicates

None

Environment

OS: Linux

Python version: 3.7.13

Output of pip freeze:

``` ```

Steps to reproduce

Example source:

``` ```

github-actions[bot] commented 2 years ago

This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇

Zessay commented 2 years ago

As above

allenai / allennlp