neptune.ai logger produces lots of errors when logging "training/epoch"

Bug description

Neptune logger gives a lot of errors like "[neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 34.0"

Those are actually false positives, the "training/epoch" curve in the neptune UI looks fine.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

setup NEPTUNE_API_TOKEN and NEPTUNE_PROJECT first for a proper connection to neptune.ai

import os

import lightning as lit
import torch
from lightning.pytorch.loggers import NeptuneLogger
from torch.utils.data import Dataset, DataLoader

class DummyDataset(Dataset):
    def __init__(self):
        pass

    def __len__(self):
        return 100

    def __getitem__(self, item):
        return {"image": torch.rand(3, 16, 16), "label": torch.randint(0, 100, (1,))}

class DummyModel(lit.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = torch.nn.Linear(3 * 16 * 16, 100)
        self.epoch_identifier = "dummy"

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch["image"], batch["label"]
        x = x.view(x.size(0), -1)
        y = y.view(-1)
        logits = self.model(x)
        loss = torch.nn.functional.cross_entropy(logits, y)
        self.log("train_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

    def validation_step(self, batch, batch_idx):
        x, y = batch["image"], batch["label"]
        x = x.view(x.size(0), -1)
        y = y.view(-1)
        logits = self.model(x)
        loss = torch.nn.functional.cross_entropy(logits, y)
        self.log("val_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)

    def test_step(self, batch, batch_idx):
        return self.validation_step(batch, batch_idx)

def main():
    wlogger = NeptuneLogger(log_model_checkpoints=False)
    output_dir = "temp_lit"
    os.makedirs(output_dir, exist_ok=True)

    trainer = lit.Trainer(
        devices=1,
        default_root_dir=output_dir,
        logger=wlogger,
        max_epochs=5,
        enable_progress_bar=False,
        log_every_n_steps=5,
    )
    model = DummyModel()
    dataset = DummyDataset()
    train_loader = DataLoader(dataset, batch_size=16, num_workers=4)
    val_loader = DataLoader(dataset, batch_size=16, num_workers=4)
    trainer.fit(model, train_loader, val_loader)

if __name__ == "__main__":
    main()


### Error messages and logs

GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs [neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/ [...] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

| Name | Type | Params

0 | model | Linear | 76.9 K

76.9 K Trainable params 0 Non-trainable params 76.9 K Total params 0.308 Total estimated model params size (MB) [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 6.0 [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 13.0 [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 20.0 [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 27.0 Trainer.fit stopped: max_epochs=5 reached. [neptune] [info ] Shutting down background jobs, please wait a moment... [neptune] [info ] Done! [neptune] [info ] Waiting for the remaining 17 operations to synchronize with Neptune. Do not kill this process. [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 34.0 [neptune] [error ] Error occurred during asynchronous operation processing: X-coordinates (step) must be strictly increasing for series attribute: training/epoch. Invalid point: 34.0 [neptune] [info ] All 17 operations synced, thanks for waiting! [neptune] [info ] Explore the metadata in the Neptune app: https://app.neptune.ai/ [...]



### Environment

<details>
  <summary>Current environment</summary>

* CUDA:
    - GPU:
        - NVIDIA GeForce GTX 1070 Ti
        - Quadro P400
    - available:         True
    - version:           12.1
* Lightning:
    - lightning:         2.2.1
    - lightning-utilities: 0.11.0
    - pytorch-lightning: 2.2.1
    - torch:             2.2.1
    - torchaudio:        2.2.1
    - torchmetrics:      1.3.2
    - torchvision:       0.17.1
* Packages:
    - aiohttp:           3.9.3
    - aiosignal:         1.3.1
    - arrow:             1.3.0
    - async-timeout:     4.0.3
    - attrs:             23.2.0
    - boto3:             1.34.66
    - botocore:          1.34.66
    - bravado:           11.0.3
    - bravado-core:      6.1.1
    - brotli:            1.0.9
    - certifi:           2024.2.2
    - charset-normalizer: 2.0.4
    - click:             8.1.7
    - filelock:          3.13.1
    - fqdn:              1.5.1
    - frozenlist:        1.4.1
    - fsspec:            2024.3.1
    - future:            1.0.0
    - gitdb:             4.0.11
    - gitpython:         3.1.42
    - gmpy2:             2.1.2
    - idna:              3.4
    - isoduration:       20.11.0
    - jinja2:            3.1.3
    - jmespath:          1.0.1
    - jsonpointer:       2.4
    - jsonref:           1.1.0
    - jsonschema:        4.21.1
    - jsonschema-specifications: 2023.12.1
    - lightning:         2.2.1
    - lightning-utilities: 0.11.0
    - markupsafe:        2.1.3
    - mkl-fft:           1.3.8
    - mkl-random:        1.2.4
    - mkl-service:       2.4.0
    - monotonic:         1.6
    - mpmath:            1.3.0
    - msgpack:           1.0.8
    - multidict:         6.0.5
    - neptune:           1.9.1
    - networkx:          3.1
    - numpy:             1.26.4
    - oauthlib:          3.2.2
    - packaging:         24.0
    - pandas:            2.2.1
    - pillow:            10.2.0
    - pip:               23.3.1
    - psutil:            5.9.8
    - pyjwt:             2.8.0
    - pysocks:           1.7.1
    - python-dateutil:   2.9.0.post0
    - pytorch-lightning: 2.2.1
    - pytz:              2024.1
    - pyyaml:            6.0.1
    - referencing:       0.34.0
    - requests:          2.31.0
    - requests-oauthlib: 1.4.0
    - rfc3339-validator: 0.1.4
    - rfc3986-validator: 0.1.1
    - rpds-py:           0.18.0
    - s3transfer:        0.10.1
    - setuptools:        68.2.2
    - simplejson:        3.19.2
    - six:               1.16.0
    - smmap:             5.0.1
    - swagger-spec-validator: 3.0.3
    - sympy:             1.12
    - torch:             2.2.1
    - torchaudio:        2.2.1
    - torchmetrics:      1.3.2
    - torchvision:       0.17.1
    - tqdm:              4.66.2
    - triton:            2.2.0
    - types-python-dateutil: 2.9.0.20240316
    - typing-extensions: 4.9.0
    - tzdata:            2024.1
    - uri-template:      1.3.0
    - urllib3:           2.1.0
    - webcolors:         1.13
    - websocket-client:  1.7.0
    - wheel:             0.41.2
    - yarl:              1.9.4
* System:
    - OS:                Linux
    - architecture:
        - 64bit
        - ELF
    - processor:         x86_64
    - python:            3.10.13
    - release:           5.4.0-172-generic
    - version:           #190-Ubuntu SMP Fri Feb 2 23:24:22 UTC 2024

</details>

### More info

_No response_

Lightning-AI / pytorch-lightning