Closed bilelomrani1 closed 1 year ago
@bilelomrani1 does your example work fine without QAT?
I would be passing just the batch
instead of **batch
@Borda I went with
def forward(self, batch: BatchEncoding) -> Tensor:
return self.transformers_module.forward(**batch).logits
I still get an exception (albeit different), the trace is the following:
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1)` was configured so 1 batch per epoch will be used.
`Trainer(limit_val_batches=1)` was configured so 1 batch will be used.
`Trainer(limit_test_batches=1)` was configured so 1 batch will be used.
/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/ao/quantization/observer.py:214: UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch.
warnings.warn(
/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1558: PossibleUserWarning: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 248, in __getattr__
return self.data[item]
KeyError: 'detach'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/bilelomrani/Documents/ILLUIN.nosync/mre-lightning-qat/mre.py", line 92, in <module>
run()
File "/Users/bilelomrani/Documents/ILLUIN.nosync/mre-lightning-qat/mre.py", line 87, in run
trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 582, in fit
call._call_and_handle_interrupt(
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 624, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1061, in _run
results = self._run_stage()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1140, in _run_stage
self._run_train()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1163, in _run_train
self.fit_loop.run()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 214, in advance
batch_output = self.batch_loop.run(kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 200, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 247, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 357, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1305, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1661, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
return self.precision_plugin.optimizer_step(
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 121, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/optim/optimizer.py", line 23, in _use_grad
ret = func(self, *args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/optim/sgd.py", line 130, in step
loss = closure()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 107, in _wrap_closure
closure_result = closure()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 147, in __call__
self._result = self.closure(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 133, in closure
step_output = self._step_fn()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 406, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1443, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
return self.model.training_step(*args, **kwargs)
File "/Users/bilelomrani/Documents/ILLUIN.nosync/mre-lightning-qat/mre.py", line 52, in training_step
loss = self.forward(batch).sum()
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/pytorch_lightning/callbacks/quantization.py", line 61, in wrapper
data = model.quant(data) # type: ignore[operator]
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1211, in _call_impl
hook_result = hook(self, input, result)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/ao/quantization/quantize.py", line 117, in _observer_forward_hook
return self.activation_post_process(output)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/torch/ao/quantization/fake_quantize.py", line 160, in forward
self.activation_post_process(X.detach())
File "/Users/bilelomrani/.pyenv/versions/mre-lightning/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 250, in __getattr__
raise AttributeError
AttributeError
I confirm that the code runs successfully when the callback is commented out.
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
The issue is still relevant
FYI, we removed the QAT callback in #16750. As explained in the linked PR:
The QAT callback can no longer be maintained by us. It has many issues that make the callback uneffective and these can't be fixed at the moment.
Users who rely on this callback: You can stay on the Lightning 1.9.x version which gets long-term support (LTS) OR you can copy the callback code and maintain it yourself.
If someone from the community is interested in fixing and maintaining this callback, please let us know.
Since this issue is relatively new, we will keep this one open. We might be able to address this and bring the fix to 1.9.x LTS. @bilelomrani1 Do you have interest in contributing a fix?
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
Hi @awaelchli sorry for the delay on this topic. I ended up going with an external library for training aware compression (NNCF by Intel, which fits my needs). I am no longer using native torch for this.
I interfaced NNCF with Lightning through a callback (not super clean yet but gets the job done). It's super simple but if there is any interest in this, I would be glad to contribute back and submit a PR. In any case, it's fine to close this issue, it is not needed anymore.
@bilelomrani1 sorry for this bump. I'd be interested in this callback, and possibly to maintain it too if needed. Just a small question: why OpenVINO NNCF and not Intel Neural Compressor?
Hi @clementpoiret, here is the callback
import logging
from typing import Any, Dict, Mapping, Optional, cast
import nncf
import pytorch_lightning as pl
import torch
from nncf.torch.compression_method_api import PTCompressionAlgorithmController
from pytorch_lightning.utilities.types import STEP_OUTPUT
class NncfCallback(pl.Callback):
logger = logging.getLogger(__name__)
def __init__(self, config: Mapping) -> None:
super().__init__()
nncf.NNCFConfig.validate(config)
self.config = nncf.NNCFConfig(config)
def on_fit_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
self.logger.info("Initializing NNCF compression algorithm")
compression_config = nncf.torch.register_default_init_args(
nncf_config=self.config,
train_loader=trainer.datamodule.nncf_initializing_dataloader(), # type: ignore[attr-defined]
criterion=pl_module.criterion,
)
pl_module.compression_controller, pl_module.model = nncf.torch.create_compressed_model(
model=pl_module.model,
config=compression_config,
dump_graphs=False,
)
if torch.distributed.is_initialized():
cast(PTCompressionAlgorithmController, pl_module.compression_controller).distributed()
def on_train_batch_end(
self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", outputs: STEP_OUTPUT, batch: Any, batch_idx: int
) -> None:
# Add the compression objective to the loss
assert isinstance(outputs, dict)
compression_loss = cast(PTCompressionAlgorithmController, pl_module.compression_controller).loss()
pl_module.log("train/base_loss", outputs["loss"])
outputs["loss"] += compression_loss
outputs["compression_loss"] = compression_loss
pl_module.log("train/compression_loss", compression_loss)
def on_after_backward(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
cast(PTCompressionAlgorithmController, pl_module.compression_controller).scheduler.step()
def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
cast(PTCompressionAlgorithmController, pl_module.compression_controller).scheduler.epoch_step()
def on_validation_batch_end(
self,
trainer: "pl.Trainer",
pl_module: "pl.LightningModule",
outputs: Optional[STEP_OUTPUT],
batch: Any,
batch_idx: int,
dataloader_idx: int = 0,
) -> None:
assert isinstance(outputs, dict)
compression_loss = cast(PTCompressionAlgorithmController, pl_module.compression_controller).loss()
pl_module.log("val/base_loss", outputs["loss"])
outputs["loss"] += compression_loss
outputs["compression_loss"] = compression_loss
pl_module.log("val/compression_loss", compression_loss)
def on_save_checkpoint(
self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", checkpoint: Dict[str, Any]
) -> None:
checkpoint["nncf_config"] = self.config
if hasattr(pl_module, "compression_controller"):
checkpoint["compression_state"] = cast(
PTCompressionAlgorithmController, pl_module.compression_controller
).get_compression_state()
The DataModule should have a .nncf_initializing_dataloader()
method that returns a nncf.torch.initialization.PTInitializingDataLoader
I wrote it a while ago with nncf==2.4.0
, some things may have changed since then. I don't remember exactly why I chose this specific framework, I was looking for a training-aware quantization implementation, maybe Intel Neural Compressor has such features but I don't remember testing it, perhaps worth taking a look. Do you have an opinion on the difference between the two?
Thanks for the callback @bilelomrani1 ! I feel that NNCF integrates well with the openvino ecosystem with specific optimizations, but that intel neural compressor might be more generic. Plus I think that intel neural compressor also has more SotA methods implemented when looking at post training quantization. For quantization aware training, it relies on pytorch's native QAT implementation
@bilelomrani1 I made very simple callbacks for intel neural compressor, if you want to try :) https://github.com/clementpoiret/lightning-nc
Bug description
When using multiple input tensors to the
.forward
method and when using theQuantizationAwareTraining
callback, the inputs are not being handled properly by the callback. This may be related to #8677 which is not solved currently.If I understand this code correctly, restrictive assumptions are being made on the type of inputs of the
.forward
method (must be a single input tensor, which is quantized before the forward pass). This is generally too restrictive: with the HuggingFacetransformers
library for instance, multiple tensors are produced by the tokenizer and must be passed as inputs to the Transformers backbone. Moreover, these input tensors are respectively embedding indices and attention masks (LongTensors), so as far as I understand, they must not be quantized prior to the forward pass, I wonder how this interacts with what is done here...How to reproduce the bug
The previous code runs correctly when the call back is commented out. We get the same exception with this slight variation of the
forward
method:Error messages and logs
Environment
cc @borda