Broke sanity check - Githubissues

Hi!

Bug description

In _run_train (trainer.py around line 1203 - see below), the sanity check does not work as expected.

Indeed, one would expect that the validation compares the predictions with the first two features only. Instead, it compares the predictions (that are now only two, since it is just the sanity check), with the whole set of features (validation set). This cause an error that makes the training impossible to continue. Thus, the assertion (see below) raises an error.

Commenting the two lines in the first picture (1203-1204) the training starts just fine, but it would be useful to be able to use the sanity check feature.

I tag below the people I work with to keep them in the loop of the solution to this bug. @ilBonez @ftesser

How to reproduce the bug

Run the example provided in the documentation: https://lightning-transformers.readthedocs.io/en/latest/tasks/nlp/question_answering.html#

import pytorch_lightning as pl
from transformers import AutoTokenizer

from lightning_transformers.task.nlp.question_answering import (
    QuestionAnsweringTransformer,
    SquadDataModule,
)

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="bert-base-uncased")
model = QuestionAnsweringTransformer(pretrained_model_name_or_path="bert-base-uncased")
dm = SquadDataModule(
    batch_size=1,
    dataset_config_name="plain_text",
    max_length=384,
    version_2_with_negative=False,
    null_score_diff_threshold=0.0,
    doc_stride=128,
    n_best_size=20,
    max_answer_length=30,
    tokenizer=tokenizer,
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)

trainer.fit(model, dm)

Error messages and logs

============================= test session starts =============================
collecting ... collected 1 item

Downloading: 100%|██████████| 28.0/28.0 [00:00<?, ?B/s]
Downloading: 100%|██████████| 570/570 [00:00<00:00, 856kB/s]
Downloading: 100%|██████████| 232k/232k [00:00<00:00, 677kB/s]
Downloading: 100%|██████████| 466k/466k [00:00<00:00, 806kB/s]
Downloading: 100%|██████████| 440M/440M [01:02<00:00, 7.06MB/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForQuestionAnswering: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

-------------------------------- live log call --------------------------------
INFO     pytorch_lightning.utilities.rank_zero:setup.py:163 GPU available: False, used: False
INFO     pytorch_lightning.utilities.rank_zero:setup.py:166 TPU available: False, using: 0 TPU cores
INFO     pytorch_lightning.utilities.rank_zero:setup.py:169 IPU available: False, using: 0 IPUs
INFO     pytorch_lightning.utilities.rank_zero:setup.py:172 HPU available: False, using: 0 HPUs
  0%|          | 0/2 [00:00<?, ?it/s]WARNING  datasets.builder:builder.py:641 Reusing dataset squad (C:\Users\lorenzo.bonetti\.cache\huggingface\datasets\squad\plain_text\1.0.0\d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453)
100%|██████████| 2/2 [00:00<00:00, 39.94it/s]
100%|██████████| 88/88 [00:51<00:00,  1.72ba/s]
100%|██████████| 11/11 [00:42<00:00,  3.89s/ba]
INFO     pytorch_lightning.utilities.rank_zero:trainer.py:2246 Loading `train_dataloader` to estimate number of stepping batches.
INFO     pytorch_lightning.callbacks.model_summary:model_summary.py:83 
  | Name   | Type                     | Params
----------------------------------------------------
0 | model  | BertForQuestionAnswering | 108 M 
1 | metric | SquadMetric              | 0     
----------------------------------------------------
108 M     Trainable params
0         Non-trainable params
108 M     Total params
435.573   Total estimated model params size (MB)
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:01<00:00,  1.58it/s]FAILED
document_information_extractor\fine_tuning_models\tests\test_training.py:96 (test_1)
def test_1():
        import pytorch_lightning as pl
        from transformers import AutoTokenizer

        from lightning_transformers.task.nlp.question_answering import (
            QuestionAnsweringTransformer,
            SquadDataModule,
        )

        tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="bert-base-uncased")
        model = QuestionAnsweringTransformer(pretrained_model_name_or_path="bert-base-uncased")
        dm = SquadDataModule(
            batch_size=1,
            dataset_config_name="plain_text",
            max_length=384,
            version_2_with_negative=False,
            null_score_diff_threshold=0.0,
            doc_stride=128,
            n_best_size=20,
            max_answer_length=30,
            tokenizer=tokenizer,
        )
        trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)

>       trainer.fit(model, dm)

fine_tuning_models\tests\test_training.py:121: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:608: in fit
    call._call_and_handle_interrupt(
..\venv\lib\site-packages\pytorch_lightning\trainer\call.py:38: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:650: in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:1112: in _run
    results = self._run_stage()
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:1191: in _run_stage
    self._run_train()
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:1204: in _run_train
    self._run_sanity_check()
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:1276: in _run_sanity_check
    val_loop.run()
..\venv\lib\site-packages\pytorch_lightning\loops\loop.py:206: in run
    output = self.on_run_end()
..\venv\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py:184: in on_run_end
    self._on_evaluation_epoch_end()
..\venv\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py:294: in _on_evaluation_epoch_end
    self.trainer._call_lightning_module_hook(hook_name)
..\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py:1356: in _call_lightning_module_hook
    output = fn(*args, **kwargs)
..\venv\lib\site-packages\lightning_transformers\task\nlp\question_answering\model.py:64: in on_validation_epoch_end
    metric_dict = self.metric.compute()
..\venv\lib\site-packages\torchmetrics\metric.py:532: in wrapped_func
    value = compute(*args, **kwargs)
..\venv\lib\site-packages\lightning_transformers\task\nlp\question_answering\datasets\squad\metric.py:29: in compute
    predictions, references = self.postprocess_func(predictions=predictions)
..\venv\lib\site-packages\lightning_transformers\task\nlp\question_answering\datasets\squad\data.py:46: in postprocess_func
    return post_processing_function(
..\venv\lib\site-packages\lightning_transformers\task\nlp\question_answering\datasets\squad\processing.py:179: in post_processing_function
    predictions = postprocess_qa_predictions(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

examples = Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 10570
})
features = Dataset({
    features: ['input_ids', 'token_type_ids', 'attention_mask', 'offset_mapping', 'example_id'],
    num_rows: 10784
})
predictions = (array([[-5.57795092e-02, -5.82439601e-02,  5.04305959e-02,
         1.47914141e-02, -1.14228278e-02,  4.22953069e-02,...8664e-02,  2.26874724e-02, -6.00413345e-02]], dtype=float32), ['56be4db0acb8001400a502ec', '56be4db0acb8001400a502ed'])
version_2_with_negative = False, n_best_size = 20, max_answer_length = 30
null_score_diff_threshold = 0.0, output_dir = None, prefix = None

    def postprocess_qa_predictions(
        examples,
        features,
        predictions: Tuple[np.ndarray, np.ndarray],
        version_2_with_negative: bool = False,
        n_best_size: int = 20,
        max_answer_length: int = 30,
        null_score_diff_threshold: float = 0.0,
        output_dir: Optional[str] = None,
        prefix: Optional[str] = None,
    ):
        """Post-processes the predictions of a question-answering model to convert them to answers that are substrings
        of the original contexts. This is the base postprocessing functions for models that only return start and end
        logits.

        Args:
            examples: The non-preprocessed dataset (see the main script for more information).
            features: The processed dataset (see the main script for more information).
            predictions (:obj:`Tuple[np.ndarray, np.ndarray]`):
                The predictions of the model: two arrays containing the start logits and the end logits respectively. Its
                first dimension must match the number of elements of :obj:`features`.
            version_2_with_negative (:obj:`bool`, `optional`, defaults to :obj:`False`):
                Whether or not the underlying dataset contains examples with no answers.
            n_best_size (:obj:`int`, `optional`, defaults to 20):
                The total number of n-best predictions to generate when looking for an answer.
            max_answer_length (:obj:`int`, `optional`, defaults to 30):
                The maximum length of an answer that can be generated. This is needed because the start and end predictions
                are not conditioned on one another.
            null_score_diff_threshold (:obj:`float`, `optional`, defaults to 0):
                The threshold used to select the null answer: if the best answer has a score that is less than the score of
                the null answer minus this threshold, the null answer is selected for this example (note that the score of
                the null answer for an example giving several features is the minimum of the scores for the null answer on
                each feature: all features must be aligned on the fact they `want` to predict a null answer).

                Only useful when :obj:`version_2_with_negative` is :obj:`True`.
            output_dir (:obj:`str`, `optional`):
                If provided, the dictionaries of predictions, n_best predictions (with their scores and logits) and, if
                :obj:`version_2_with_negative=True`, the dictionary of the scores differences between best and null
                answers, are saved in `output_dir`.
            prefix (:obj:`str`, `optional`):
                If provided, the dictionaries mentioned above are saved with `prefix` added to their names.
        """
        assert len(predictions) == 3, "`predictions` should be a tuple with two elements (start_logits, end_logits)."
        all_start_logits, all_end_logits, example_ids = predictions

>       assert len(predictions[0]) == len(features), f"Got {len(predictions[0])} predictions and {len(features)} features."
E       AssertionError: Got 2 predictions and 10784 features.

..\venv\lib\site-packages\lightning_transformers\task\nlp\question_answering\datasets\squad\processing.py:247: AssertionError

Environment

Current environment

``` #- Lightning Component: Trainer - PyTorch Lightning Version: 1.9.4 - Lightning-transformers: 0.2.5 - Lightning-utilities: 0.8.0 - Python version: 3.9.13 - OS: Windows 11 - How you installed Lightning: pip - Running environment: local ```

Thank you for your help!

Lightning-AI / pytorch-lightning

Broke sanity check #17077

Bug description

How to reproduce the bug

Error messages and logs

Environment